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PROGRAM SUMMARY 


United Technologies Research Center (UTRC) has developed a framework to integrate state-of-the-art 
rocket engine technology with fault detection algorithms for a Health Management System (HMS) for the 
space shuttle main engine (SSME). UTRC has developed this HMS framework on the basis of an analysis of 
the SSME failure modes and the engine monitoring requirements. The UTRC HMS utilizes the existing 
flight instrumentation and satisfies NASA requirements of using near-term technologies to enable ground 
testing of the HMS within five years. The UTRC HMS is initially targeted to support SSME ground tests, 
however, the system design provides for a clear migration path to a flight system. UTRC has also develope an 
implementation plan for the HMS. Which provides for phased implementation and integration of the HMS on 

the SSME teststand. 

The HMS framework development process drew upon numerous resources such as SSME design and 
failure history, SSME operations and teststand procedures, and SSME teststand data. UTRC evaluated a 
broad range of fault detection algorithms, sensor technologies, and hardware architectures before selecting 
the most promising algorithms, sensors, and hardware architecture, consistent with the NASA program goals, 
to incorporate into the HMS framework. 

To establish the requirements for the failure detection methods, UTRC first analyzed the SSME failure 
modes, available teststand data, SSME models, and SSME operations. The facts that the SSME analytical 
models are directed at performance analysis and that the teststand data are directed at performance 
measurements rather than diagnostics put constraints on the selection of failure detection methods. The 
SSME Power Balance and Digital Transient models are very complex, and can not be easily modified to 
simulate engine failures. Furthermore, the available SSME teststand data was primarily from the development 
phase of the SSME program, and thus, exhibited significant test-to-test variation due to hardware design and 
build changes. The complexity of the analytical models and the variability of the teststand data dictated that 
empirical/data-driven fault detection techniques be selected over techniques which require accurate 
analytical engine models for all conditions of interest. 

A major objective of this program was to assess and evaluate candidate approaches to detect SSME 
failures earlier than redline cutoff. The approaches included a study of fault detection algorithms, along with 
an assessment of existing and near-term sensor technologies that could be used to augment the performance 
of the HMS. The fault detection approach developed by UTRC uses algorithms and a system hierarchy which 
exploits the interrelatedness of the SSME components and parameter measurements; provides a methodology 
which is robust to sensor loss and engine build variability, while covering all phases of engine operation. 

A set of three algorithmic approaches was developed and implemented to detect faults which manifest 
themselves in engine parameter measurements as gradual long term trends, quick and high amplitude 
excursions, and oscillatory or nonsteady state behavior. Autoregressive Moving Average (ARMA) models, 
based on time series analysis, were developed to detect fast excursions in engine parameters and also changes 
from a stationary to a nonstationary condition. The UTRC Recursive Structural Identification (RESID) 
algorithm was used to develop a regression model between the SSME propellant flows and the thrust chamber 
pressure during open-loop start and shutdown sequences. This RESID model was then used to detect 
abnormal behavior during startup and shutdown. A sensor fusion technique, clustering, was developed to 
detect failures which manifest themselves as gradual trends in performance parameter measurements and to 
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distinguish this behavior from trends associated with normal engine operation. Because it exploits the 
interrelationships of the sensor measurements, the clustering technique is robust to sensor loss and 
build-to-build variations in the SSME. The three algorithmic approaches were tested on data from 15 SSME 
failures and provided fault detection times equal to or better than redline cutoff times for fourteen of the 
fifteen cases. 

Recommendations for improving the quality of information available to the fault detection algorithms as 
well as for extracting more information from the existing sensor suite were presented. The addition of new 
nonintrusive sensors directed specifically at component health assessment were suggested for long-term 
incorporation within the SSME. Near-term enhancements could also be derived by exploiting the higher 
frequency bandwidths of the pressure transducers currently installed in the SSME. The computational 
requirements for each of these approaches were assessed for incorporation in the HMS architecture. 

A design methodology was developed and demonstrated to map the HMS functionalities onto a 
hardware architecture. This design methodology provides a modular, flexible approach to architecture design, 
thereby delineating a clear migration path from the ground test HMS to a flight system. An implementation 
plan, developed for the groundtest breadboard HMS architecture, provides a phased implementation of HMS 
functionalities on the teststand, and includes the program schedules, manpower requirements, and materials 
cost. 


To demonstrate the benefits of an HMS for the SSME, the program required the provision of a measure 
of HMS effectiveness. UTRC selected seven criteria which encompass nearly all performance, reliability, and 
implementability issues. The first criterion was the probability of a fault detection. The UTRC HMS 
demonstrated 100% detection for the data that was available. The second criterion was a low false alarm rate. 
For the data tested to date, the UTRC algorithms exhibited only three false alarms during power transitions 
which were caused by missing data from one or more highly weighted sensors. It is anticipated this small 
number of false alarms can be further reduced through slight adjustments in the algorithms. The third 
criterion, time of detection, was demonstrated by the UTRC HMS through fault detection times which were, in 
most cases, better than redlines. The fourth criterion selected was the probability of a hardware failure. UTRC 
chose a high reliability design to minimize the chance of hardware failures, and a modular architecture to 
minimize the impact of such a failure in the unlikely event that one occurred. The fifth criterion, complexity, 
was minimized through the development of algorithms which require minimal processing of the SSME data. 
Feasibility of implementation, the sixth criterion, was maximized through the development of a phased 
implementation plan which provides benefits from the HMS early in the implementation program. Cost, 
which was the seventh criterion, was minimized through the selection of commercially available, industry 
standard hardware. 

The purpose of this study program was to demonstrate feasibility of and lay the foundation for a program 
to enhance the safety of SSME operation in both ground test and flight scenarios. The key issues of fault 
detection algorithms, hardware architectures, and implementation plans were successfully addressed. The 
results clearly indicate that it is feasible to use the existing flight SSME instrumentation as the basis for an 
HMS that can provide significant near-term improvements in operation safety. Furthermore, the flexibility of 
the approach developed in this program provides for ease of growth to incorporate and accommodate new 
advances in health monitoring technology, thus, providing long-term enhancements to safety. 
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SECTION 1.0 
INTRODUCTION 

The space shuttle main engine (SSME) is the first operational liquid rocket engine developed for reuse. 
NASA’s space exploration objectives rely heavily on the performance of the SSME. The SSME is a man-rat^ , 
power dense engine with high thrust requirements which are met by a staged combustion cycle operation. Ihe 
SSME development began in the 1970’s, and since then it has performed reliably and safely for over 30 space 
shuttle missions, and for over 411,000 seconds of ground testing. However, four minor failures have occurred 
during launches or launch attempts, and, out of 40 premature shutdowns, there have been 27 major failures 
during ground testing which resulted in substantial teststand and engine hardware damage. 

A major SSME failure has catastrophic consequences during flight, but it can also prove to be very 
expensive during ground test, not only in terms of the loss of hardware, but also the accumulated testing time 
on a given engine. The SSME undergoes a significant amount of testing before they are qualified for flight. 
Engine monitoring, therefore, can play a significant role in providing improved availability, reliability, sa ety, 
and reduced cost to meet NASA’s space objectives. 

The current technique of using parameter redlines for monitoring the SSME is a sensor-intensive, 
algorithmically simple approach that is incapable of detecting incipient failures. Rocketdyne has developed 
and is implementing the System for Anomaly and Failure Detection (SAFD) to provide improved gound test 
monitoring for the SSME[1], The SAFD algorithms use statistical confidence intervals on sensed parameters 
to achieve fault detection. SAFD algorithms are suitable only for mainstage operation of the SSME. In recent 
years, considerable advances have been realized in the theory and practice of failure detection algorithms for 
many different mechanical and electronic systems. These techniques cover a wide range of approaches from 
matched filters and Bayesian detectors to adaptive learning networks and artificial intelligence. The 
computational complexities of these techniques vary widely, although, with the continuing advances m the 
computing power of digital processors, most of these techniques are within reach of a real-time health 
monitoring system. UTRC has developed a framework for such a Health Management System for the SSME 
that integrates near-term sensor technologies with a failure detection methodology for early detection of 

faults. 

The framework developed by UTRC emphasizes an HMS focused to enhance safety. It uses existing 
instrumentation and near-term technology concepts to enable ground testing within five years. It is designed 
to initially support ground tests, with the clear capability to migrate to a flight system. The UTRC HMS 
framework incorporates fault detection algorithms, sensor technologies, and the SSME performance models, 
and maps these key components onto a hardware architecture consistent with the program goals. 

A detailed discussion on the HMS Framework developed by UTRC is presented in this final report. 
Section 2 describes the SSME database used to identify the SSME failure modes. The information about these 
failure modes was used to identify the monitoring requirements for the HMS. Section 3 of this report presents 
the fault detection algorithms and an evaluation of the near-term sensor technology. Section 4 discusses the 
hardware architecture followed by an implementation plan for the HMS in Section 5. Finally, Section 6 
discusses program conclusions, and is followed by Appendix A, in which a detailed discussion of HMS failure 
detection algorithm results are presented. 
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SECTION 2.0 

IDENTIFICATION OF THE SSME FAILURE MODES 


The SSME entered developmental testing in 1975 and has since accumulated over 411,000 seconds of 
operation in over 1500 hot-fire tests. Although the engine has exhibited high reliability, 27 major incidents 
have occurred during ground test. Many of these failures have resulted in signiGcant repair costs to engine and 
facility hardware, program schedule delays, and loss of fleet leader components. Extensive documentation 
consisting of teststand data and failure analysis reports exist for many of the incidents. These data can be used 
to guide the development of an HMS by identifying those failures whose detection prior to catastrophic failure 
would produce the greatest increase in operational safety of the SSME. Thus, Task 1 of the HMS Framework 
Development Program deals with the identification of the SSME failure modes. This task prepares the 
groundwork for establishing the HMS Framework by identifying the monitoring requirements for the SSME. 
In order to establish a systematic procedure for developing the HMS Framework, Thsk 1 characterizes the 
failure modes in terms of a set of criticality criteria. 

It was anticipated that certain fault-specific algorithms would be required to provide optimal detection. 
Characterization of the SSME failure modes was a necessary first step in the identification and the 
development of fault-specific algorithms. Although fault-specific detection algorithms were later deemed 
unnecessary, the information provided by the study of the SSME failure modes identified major SSME 
components whose operational health was required to be monitored by the HMS. 

2.1 SSME Database 

The successful design of an HMS for the SSME requires knowledge of the engine operation, engine 
failure modes, the existing sensor set, the parameters currently monitored, the ground test operations, and the 
processing capabilities. UTRC has assembled an SSME database which includes the SSME teststand data, 
SSME analytical models, and written SSME documents. 

2.1.1 SSME Teststand Data .— UTRC has acquired SSME test firing data tapes which include 16 failure 
incidences, 2 nominal firings, and 1 nominal cluster firing. The failure data cover a test period from 1977 to 
1989. The teststand data include CADS (Command and Data Simulator) data and Facility data. The CADS 
data represent data from the sensors mounted on the engine and sampled every 40 ms, while Facility data 
consists of data from facility and engine mounted sensors, sampled every 20 ms. Table 2.1 presents the test 
numbers, along with the power level settings at which failures occurred, run time duration, and failure 
information for each of the data files. 

Each CADS data file contains 130 columns of data, with each column identified by a CADS parameter 
identification (PID) number. There are many redundant sensor measurements. For example, Main 
Combustion Chamber Pressure is measured on 8 different channels. Also, some data columns contain 
information regarding vehicle interface or pogo connections. Consequently, out of the 130 columns of data, 
approximately 50 contain nonredundant information about engine performance. The facility data files consist 
of about 250 columns of data. As in the case of CADS data, there are many redundant sensor measurements, 
and also those measurements such as air temperature, command words, purge system pressures, and bleed 
valve positions that are not directly relevant to monitoring engine performance or engine failures. 

Each of the SSME test profiles can be divided into four operational phases: pre-startup, startup, 
mainstage, and shutdown. The pre-startup stage is not considered relevant for engine performance analysis. 
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Table 2.1 SSME TESTSTAND DATA 


Test # Power Level 
at Failure 
<%) 


R Un Failure Information 

Time Duration Component Cause 

(secs) 


Type of Data 
CADS Facility 


902-249 

109 

450.58 

HPFTP 

Turbine Bladet Cracked 

X 

901-340 

109 

405.50 

HPFTP 

Dlacharga Turnaround 
Duct Rupture 

X 

901-364 

109 

392.15 

HPFTP 

Kalter Hat Nut Design Defect 

X 

901-436 

109 

611.06 

HPFTP 

Coolant Liner Buckle 

X 

901-110 

75 

74.07 

HPOTP 

LOX Seal Burning 

X 

901-225 

100 

255.61 

Valve 

Main Oxidizer Valve 
Retaining Screw Failure 

X 

901-264 

Startup 

9.88 

Control 

PC Seneor Failure 

X 

901-173 

92 

201.17 

Prebumer 5 
Main Burner 

Main ln|ector Pott Crack 

X 

902-196 

102 

8.52 

Prebumer a 
Main Burner 

Main Injector Pott Crack 

X 

901-222 

Startup 

4.33 

Duett 

Heat Exchanger Weld Rupture 

X 

750-250 

100 

101.50 

Duett 

MCC Outlet Manifold Crack 

X 

750-168 

Shutdown 

300.2 

Valve 

OPOV Failure 

X 

901-307 

109 

75.03 

Fuel Prebumer 

Injector Failure 

X 

901-331 

100 

233.14 

Main Burner 

Injector Failure 

X 

SF10-01 

102 

104.80 

Fuel Prebumer 

ln|ector Failure 

X 

SF6-01 

100 

18.58 

Valve 

Main Fuel Valve Failure 

X 


902*457 

CF902 

902-463 


104-109 

100 

109-111 


310.00 
678.16 

635.00 


Nominal Operation 
Nominal Operation 
Nominal Operation 


X X 

X 

X X 
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During startup and shutdown, the SSME controller invokes open-loop, time sequenced regulation logic, while 
during mainstage operation, closed-loop feedback is provided. The SSME controller regulates engine thrust 
and oxidizer /fuel mixture ratio during mainstage operation by sensing the main combustion chamber pressure 
(MCC_PC) and the volumetric fuel flow rate (FU_FL). Control of these parameters is achieved by modulating 
the oxidizer and fuel prebumer oxidizer valves (OPOV_ACT_POS and FPOV_ACT_POS). 

The parameter MCC_PC is proportional to the engine thrust, and hence, a plot of MCC_PC represents a 
thrust profile for a complete test. Figure 2.1 shows a sample test profile. Available test data show a wide 
variation in the test profiles in terms of power levels achieved, throttling and venting procedures, and sensors 
that were recorded during the tests. During normal operation, all CADS sensors show a strong correlation 
with MCC_PC. Most of the CADS data show stationary behavior during mainstage operation at a given 
power level. The significant changes in data values, during normal operation, are due to power level changes 
(Figure 2.2), or (based upon the conclusions in Rocketdyne Accident/Incident Reports) venting of the 
propellant tanks (Figure 2.3). Figure 2.4, however, shows that some sensor data, such as valve actuator 
positions or turbine discharge temperatures, do show fluctuating behavior during normal operations. 

During failure incidences, most of the CADS data remain stationary until a few seconds before the 
redline induced shutdown, at which time there is a sudden increase in data values for almost all the sensors. 
Only in a very few cases is there an early indication (about hundred seconds before redline cutoff) of failure in 
terms of a large change in sensor value (Figure 2.5). Some of the sensor data, as shown in Figure 2.6, show 
gradual trends during failure incidences. 

2.1.2 SSME Analytical Models .— To understand the SSME behavior during normal or abnormal 
operation, UTRC has utilized analytical models of the SSME. TWo SSME simulation models, the Power 
Balance Model (PBM) and the Digital Transient Model (DTM), and the SSME controller model run at UTRC. 
A third simulation model, Test Information Program (TIP88), runs at the UTC Pratt & Whitney (P&W) 
facility. 

The PBM models a “typical” SSME with a set of nonlinear equations and calculates the engine 
steady-state power balance through iterative techniques. The governing equations are focused upon a 
conservation of energy approach. The model progresses step by step through SSME sections and iterates 
parameters until pressures, temperatures, and flowrates for the section/assembly are continuous: the energy 
available, based upon these parameters, is equal to the energy required by the assembly. The PBM provides 
steady-state “design point” values for SSME operation from minimum power level of 50% rated thrust to full 
power level of 109% rated thrust, and at mixture ratios from 5.8 to 6.2. 

The DTM simulates the SSME through startup, mainstage, and shutdown operations. The model 
partitions the engine into a set of subsystems of component processes. These process elements are modeled 
with collections of equations which describe both the static and dynamic physical processes which occur in the 
engine subsystems. The DTM does not, however, model low frequency effects at a steady power level. 

The SSME controller model is based upon state-space and transfer function equations, and models the 
OPOV and FPOV commands and actuators. Actual test data for MCC_PC and calculated mixture ratio were 
used as inputs to the model which was implemented in an interactive data analysis package called MATLAB. 
Predicted OPOV and FPOV commands and positions were compared to measured commands and positions 
to establish whether trends in the data were due to controller actions. The slight variations about equilibrium 
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or nominal valve positions were shown to be caused by controller actions, however, the model simulation was 
not conclusive in determining the cause of the long-term trends. 

The Test Information Program (TIP88) is an SSME steady-state model consisting of three separate 
sections: Data Reduction, Base Balance, and Rated Programs. The Data Reduction Program examines 
measured test data (CADS and FRS) to define the operating characteristics specific to that particular engine. 
The Base Balance Program calibrates the engine model by adjusting performance variables based upon the 
data reduction results. The Rated Program essentially serves as an engine specific PBM; the calibrated model 
provides steady-state simulation of the specific engine at different power levels. 

The process of seeding the analytical models with faults to examine fault propagation or verify 
algorithms was not accomplished due to the complexity of the models. The models, which are directed at 
modeling and evaluating engine performance, were used to provide design point inputs to the failure detection 
algorithms. 

2.1.3 SSME Documents.— In the process of building a knowledge base for the SSME, UTRC has 
accumulated a large number of documents including accident/incident reports, unsatisfactory condition 
reports (UCRs), controller specifications, operational descriptions, and numerous reports from industry and 
academia on health monitoring of SSME and liquid rocket engines in general. These references were essential 
for understanding the SSME, its failures, and the current state-of-the-art in diagnostics and sensor 
technologies. The database of information provided a base from which the design of the UTRC HMS 
stemmed. 

2.2 Major Component Classes of the SSME 

UTRC divided the numerous components that make up the SSME into “Major Component Classes” as 
opposed to Line Replaceable Units (LRUs) or structural components. Each major component class, listed in 
Table 2.2, is a grouping of components with similar functionality, such as valves or turbopumps. This 
approach was used because the line replaceability of a component is not significant in terms of the criticality of 
its failure or its effect on safety. Although line replaceability is important from the maintenance standpoint, 
maintenance issues were not addressed in this program. 

TABLE 2.2 - MAJOR SSME COMPONENT CLASSES 

• Turbopumps 

• Hot Gas Manifold 

• Main Combustion Chamber 

• Nozzle 

• Controller 

• Propellant Valves 

• Interconnects (Lines and Ducts) 

• Actuators 

• Sensors 

• Pogo Accumulator 

• Structural Connectors 

• Harnesses 

• Flight Accelerometer Safety Cutoff System 

• Pneumatic Controls 
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Each major component class is comprised of “components” which were further divided into 
“subcomponents”. For example, the major component class of turbopumps has HPFTP, HPOTP, LPFTP, and 
LPOTP as components, and turbine blades, seals, and bearings as subcomponents. Table 2.3 lists the 
components of the major classes which have failed during the SSME ground testing. 

2.3 SSME Failure Modes 

The current Rocketdyne FMEA divides the numerous SSME failure modes into two categories: 1) 
Criticality 1 - Loss of life/vehicle, and 2) Criticality 3 - Other. Wide variations in the manifestation of each 
failure during the initial and intermediate stages exist. The detection of a failure by an HMS is determined by 
the physical phenomenon that gives rise to this failure, the speed and complexity of damage propagation, and 
the number and location of sensors that are able to detect the failure. 

Based upon the Rocketdyne FMEA, P&W-ATD FMEA, and the UCR CALSPAN database, the total 
number of SSME failure modes identified (about 900 failure modes with Rocketdyne defined (criticality 1) is 
very high. The near-term implementation of the HMS for the SSME ground tests eliminated some of the 
complex failure modes as being too difficult to manage given the state-of-the art in sensors, signal processing, 
and the limitations of the Block I/II controller design. The scope of this HMS program called for establishing 
the viability of a health management system by analyzing a small number of failure modes that have a direct 
impact on the engine safety and are manageable, i.e. failure modes which can be detected rapidly by a HMS so 
that the engine damage can be minimized. Thus, failure modes associated with design problems or material 
defects have been eliminated, so also, the failure modes associated with fatigue cycles and remaining life, even 
though they may be important from a maintenance point of view. 

There have been 27 ground test firings classified as major failure incidences. Each failure incidence 
represents the occurrence of one or more failure modes of the SSME. The most common engine failure modes 
are associated with seals, valves, bearings, turbine blades, and ducts. High cycle fatigue in the injector LOX 
posts initiated the most failures, while the high pressure fuel turbopump was the initiator of the most major 
incidences. Figure 2.7 shows the major SSME components and their associated failure modes. 

2.4 Failure Mode Ranking 

The purpose behind the ranking of failure modes is twofold: 1) to characterize the failure modes in terms 
of engine safety and impact on the mission; and 2) to use the rankings to select a set of failure modes for further 
analysis in Task 2. In order to proceed with the analysis in Task 2, the selected set of failure modes needed to be 
correlated with the SSME test firing data (the failure incidences data). Although there can be several failure 
modes associated with each incidence, the mode that was the initiator was considered to be the most 
significant from the perspective of this study. Therefore, the ranking of the failure incidences is also the 
ranking of the primary failure mode associated with that incidence. In cases where more than one failure 
incidence for a given failure mode exists, the ranking score for the worst case failure incidence was used for the 
failure mode score. 


2.4.1 Ranking Criteria .- The following set of criteria was established to rank the major SSME failures. 
For each criterion, the higher the score, the more severe the failure. These criteria are consistent with the 
program goals to improve safety and minimize engine and teststand damage. 
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Table 2.3 MAJOR INCIDENCE FAILURES 


SSME MAJOR 
COMPONENT CLASS 

COMPONENT 

SUBCOMPONENT 

MECHANISM 

TEST NO. 

Turbopump 

High pressure oxidizer 
turbopump HPOTP 

Seal 

rubbing 

901-110 



Bearings 

Improper coolant flow 

901-136 



Sensor 

rubbing 

902-120 



Turbine Blade 

unknown 

901-362 


High pressure fuel 

Turn-around duct 

fracture 

901-340 


turbopump HPFTP 



901 -36 3 



Coolant Liner 

buckle 

902-118 



Seal 

fracture 

901-436 



Kaiser helmet 

leak 

901-364 



Kaiser hat nut 

loss of 

902-209 



Turbine blades 

unknown 

902-249 

902-095 

901-346 



Damper 

loss of 

901-410 



Bull nose nut 

lost saverlesen 

901-362 



Turbine 

seizure 

901-147 

HGM 

Main Injector 

LOX post 

fracture 

901-173 

901-331 

750-148 

901-183 

901-198 


HPFTP/Fuel prebumer 
(FPB) 

Injector - LOX post 

fracture 

901-307 



injector 

erosion 

SF10-01 



Injector 

blockage 

750-160 


Heat exchanger 

Coil leak 

fracture 

901-222 

Sensors 

Lee jet 

Purge orifice 

dislodge 

901-284 

Interconnects 

High pressure oxidizer duct 


fracture 

750-175 


Fuel Inlet line 


blockage 

902-112 


Coolant outlet neck 


fracture 

750-259 

Nozzles 

Stacked nozzle 

Nozzle coolant tube 

rupture 

901-485 

Valves 

Main fuel valve 

Housing 

fracture 

SF6-01 


Main oxidizer valve 

Inlet joint 

fretting 

901-225 




mlsindexlng 

902-132 


Oxidixzer preburner valve 

Seat 


750-168 


14 


mm ran 




HPOTP 


Bearing Failures 
Seal Failures 


HGM 



Main Injector Failures 
FPB Injector Failures 


HPFTP 


Power Transfer Failures 
Seal Failures 


Fig. 2.7 FIVE GENERIC FAILURE MODES OF THE SSME 
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2.4. 1.1 Definition of Ranking Criteria 


Severity of Damage - determined from the descriptions of the overall damage that resulted 

from the incident. 


3 - Loss of Engine 

2 - Loss of One or More Components 

1 - Minimal Damage 

II. Time to Failure - determined from the data excursion intervals defined by the 

Rocketdyne System for Anomaly and Failure Detection (SAFD). 

3 - Long (10 secs or greater) 

2 - Medium (1-10 secs) 

1 - Short (less than 1 sec.) 

Failure incidences with a veiy short time to failure were considered “unmanageable” from the point of view of 
near-term detection methodologies. 

HI. Frequency of Occurrence - derived from the number of occurrences of “generic” failures or 

failures of a similar type. 


3 - Chronic Problem 

2 - Several Failures 

1 - One Time Occurrence or Designed Out 

IV Power Level 


3 - Failure Occurred during Mission Profile Power Level (start transient, 0-104% 

RPL, shutdown transient) 

2 - Failure Occurred between Power Level of > 104% RPL and < = 109% RPL 
1 - Failure Occurred at Power Level > 109% RPL 


A higher score was given to failures which occurred during operation at normal mission profile power levels of 
the SSME. Failures at other power levels were assumed to be aggravated, at least in part, by the extended 
power operation. The extent of the influence of power level on the failure incidence is not easily determined. 
Since many failures are initiated by high-cycle-fatigue mechanisms, the same population of failures may 
occur during extended power operation, but at a lower number of cycles than that observed for mission power 
levels. 


2.4.2 Results of Failure Mode Rankings .— Tkble 2.4 presents the results of failure modes rankings with 
scores for the individual criterion and the composite scores. The failure incidences are indicated by the test 
firing numbers. The final ranking is the sum of all individual scores with a possible maximum value of 12, and a 
minimum value of 4. The highest score of 10 was achieved by the four failures listed in Table 2.5. The UCRs for 
each of these tests were used to further compare these failures to achieve the rankings shown in Table 2.5. 
Table 2.6 further summarizes the ranking of all of the failure modes along with information pertaining to the 
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Table 2.4 RANKING MATRIX FOR THE SSME FAILURE MODES 





Table 2.4 RANKING MATRIX FOR THE SSME FAILURE MODES (CONT.) 


Failure 

Mode 


Ranking 

Criterion 


COMPONENT: HPOTP 

Lox Seal Burning 
901-110 

Bearing Seal Failure 

901- 136 

Experimental Speed Probe 

902- 120 


COMPONENT: 

INTERCONNECTS 

Crack/Leak 

750-259 

750-175 


Blockage 

902-112 

COMPONENT: HGM 


FPB LOX Post Fracture 
901-307 

FPB Injector Errosion 
SF10-01 

FPB Injector Blockage 
750-160 

Main Ejector LOX Post 

Fracture 

901-173 

901-331 

750-146 

901- 183 

902- 198 


Heat Exchanger Tube Leak 
901-222 


Total 

High = 12 
Low = 4 


MM FR 
»ZJ*C TBt I4t* 



Table 2.5 FOUR MOST CRITICAL SSME FAILURE MODES 


Rank 

Failure 

Mode 

Component 

Test # 

Score 

Comments 

1 

Coolant Liner Buckle 

HPFTP 

901-436 

10 

Engine Loss 


Bearing Failure 

HPOTP 

901-136 

10 

Extensive Damage 


LOX Seal Burn 

HPOTP 

901-110 

10 

Extensive Damage 


Main Injector LOX 
Post Fracture 

HGM 

901-183 

10 

Several Failures 
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Table 2.6 FAILURE MODE RANKING AND DATA AVAILABILITY SUMMARY 


Score 

Description 

Test 

Data 

SAFD 

Test Date 



HPOTP bearing taAure 

901-136 

ii 

li 

i 

901-436 


HPFTP kaiser nut failure 


FPB LOX poet fracture 


PPB erosion 


MMnlrt LOX post traefcre 


HPFTP tsblne Made 


HPFTP UMmbMi 


Power transfer faiure 


Po w er trairaferfaBura 


Power transfer failure 


Power transfer faBura 


HPFTP T/A duct nature 


HPfTP gas leak 


MOV trotting 


Vitos osal faiure 


MOVmUndead 


Lae fat sensor faflura 


MCC outlet duct Picture 


FPB h^dor bfeefuge 


Main Injector fractura 


Main injector tract** 


Hast exchanger tubs Is* 


T/A duct 


T/A duct rupture 


MFV crack 


Noofe tube nature 


MM dUct 


Main ty. tractor* 


HPOTP apsed sensor 


HP.oxtdMarduct 


Norn lnal-1 


Ifomrai* IW^Mwr 


Nominal -40 


NomtaaJ 



06/05/76 


03/24/77 


09/06/77 




01/26/61 


07/12/60 


07/23/80 


09/21/81 


11/19/81 


12/01/77 


04/23/83 


03/27/82 


01/07/77 


04/07/82 


11/14/80 


12/27/78 


05/15/82 


10/03/78 


07/30/80 


03/27/85 


02/12/82 


03/31/78 


07/15/81 


12/06/78 


07/10/78 


10/15/81 


07/02/79 


07/24/85 


06/10/78 


09/02/81 


07/18/78 


08/27/82 
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availability of test data. No attempt was made to rank the failures that received identical scores other than the 
highest ranked failures shown in Table 2.5. 

243 Summary of Failure Mode Rankings .— The 14 failures receiving a score of 9 or 10 make up five 
generic types of failure modes occurring within 3 major LRUs. The HPOTP has experienced problems due to 
seal and bearing failures. The HGM has experienced a significant number of injector problems in both t e 
main injector and the fuel prebumer injector. The third LRU, the HPFTP, has experienced problemswrth 
power transfer components (including turbine blades) and with seals. One isolated incident of the 
involved the failure of the Kaiser Nut which resulted in extensive engine damage. Thus, the failure detection 
methods in Phase I, Task 2 addressed these five failure modes of the three LRUs. In the case where data did 
not exist for a highly ranked failure mode, such as the main injector failure (901-183), data from a lower 
ranked, but similar failure was substituted (such as main injector failure 901-331 or 902-198). 
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SECTION 3.0 

METHODS TO DETECT FAILURES AND MINIMIZE ENGINE DAMAGE 


The primary goal of the SSME HMS is to detect engine failures as early as possible, and then direct the 
engine controller to shut down the engine to minimize damage. Program requirements dictate that the HMS 
must use existing instrumentation and near-term technology concepts to enable ground testing within five 
years. In the HMS framework development process, UTRC has emphasized new diagnostic sensors and 
failure detection algorithms as key ingredients to enhance engine health monitoring. 

3.1 Failure Detection Algorithms 

Most failures are preceded by growing intolerances or imbalances in the engine which initially manifest 
themselves through subtle deviations in engine parameters. The process of failure detection is concerned with 
observing these deviations from nominal operation in the sensor measurements. A component malfunction 
usually results from a number of distinct failure modes, and each of these failure modes may affect the sensor 
measurements in a different manner. The failure classification process is concerned with observing the 
different ways in which the various failure modes affect the sensor measurements. 

The success of failure detection algorithms is directly related to the information content of the sensor 
signal. The current SSME sensor set is primarily used to monitor engine performance and provide input to the 
controller. Thus, the scope of the failure detection algorithms is limited to only those faults that produce 
changes in the engine operation which the sensors can detect. 

Based on the preliminary analysis of the SSME sensor data, a failure can be identified as a deviation from 
the normal or the design envelope of the engine operation. It has not been possible, however, to identify a 
characteristic pattern of deviations in the sensor data associated with a particular type of failure mode. Given 
an extensive database of failure incidence documentation and data, it may be possible to perform fault 
identification and isolation. The scope of this program, however, and the limited failure documentation 
dictated that a methodology for failure detection, but not for failure identification and isolation, would be 
designed. 

3.1.1 Candidate Failure Detection Algorithms.— There is not a single optimal failure detection algorithm. 
A range of techniques, from basic signal conditioning to pattern recognition and artificial intelligence, 
requires examination to determine the best approach. 

3. 1.1.1 Pattern Recognition (PR) Methods.— Pattern recognition techniques are data-driven, empirical 
methods, well suited for the SSME because it has undergone extensive testing and has produced a large 
experimental database of normal operations and failure modes. PR techniques can involve signal processing, 
feature extraction, and classification. These techniques are trained to discriminate between the normal and 
abnormal system behavior by means of a training data set, and then use that capability to classify test data into 
normal and failure classes. 

3. 1.1.2 Model-based Fault Detection and Isolation (FDI) Methods. —In the absence of experimental data, 
an engine simulation model is required. The SSME PBM and DTM models have been developed to predict the 
SSME performance during steady-state and transient operations. However, many sensor measurements show 
drifts and biases during normal operations that can not be accurately accounted for by the SSME models. 
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Model-based fault detection and isolation methods require an accurate mathematical model of the engine^ 
Most of these methods work well with linear and time-invariant systems. However, modeling errors and 
system nonlinearities tend to affect FDI algorithm performance in terms of robustness and detection 

sensitivity. 

FDI algorithms track the normal operations of the engine by processing the sensor measurements and 
utilizing an accurate mathematical model of the engine which links the engine inputs and control parameters 
with the output measurements. When the engine is operating in a normal mode, the sensor outputs follow 
certain predictable trajectories within specified limits of accuracy. A failure is indicated when the sensor 
measurements deviate from the model prediction. 

3. 1.1. 3 Artificial Intelligence (Al) Methods . -In the absence of accurate mathematical models and 
experimental data, AI techniques make use of expert knowledge to develop qualitative models and perform 
qualitative reasoning about failure conditions. AI techniques use automated decision making processes 
based on a qualitative model of a system to deal with system performance or system failures. The automated 
decision making processes rely on knowledge-based rules or qualitative expressions derived from traditional 
physics concepts, and perform detailed analysis of the system to provide intelligent advice. These methods 
were not employed in the HMS due to the lack of adequately defined fault characteristics which could be used 
to develop the models of failure modes. 

3.1.2 UTRC Failure Detection Methodology . — The availability of the SSME teststand data and the 
complexity of the SSME analytical models helped define the rationale for choosing the data-driven 
approaches for evaluation and testing. The SSME analytical models are performance driven, and do not 
generate failure related information. The models are also not capable of real-time operation on 
minicomputer. To compensate for the lack of sufficient nominal teststand data, the SSME models were mainly 
utilized to generate ‘design point’ parameter values for nominal engine operation. The data-driven 
approaches were thus utilized to characterize engine operation during nominal or failure modes. 

The UTRC failure detection methodology is a two step process. The first step involves characterizing the 
nominal operation of the engine parameters. Analytical models and data from nominal tests are utilized to 
build empirical models for normal engine operations. The second step compares the teststand data with the 
model output, and declares a failure if the measured data diverge from the model values. Figure 3. 1 shows the 
overall failure detection scheme. 

3.1.3 Algorithms for UTRC Failure Detection Methodology.— The SSME operates in three phases: startup, 
mainstage, and shutdown. The startup and shutdown are open-loop operations commanded by time 
sequenced opening or closing of the SSME fuel and LOX valves, while during mainstage, the main combustion 
chamber pressure and the oxidizer/fuel mixture ratio are regulated by the controller. The algorithms used for 
failure detection grew out of an evolutionary process that systematically studied the information content of the 
SSME sensor data. Most of the data show stationary behavior during mainstage; significant changes in the 
data values during nominal operation are due to power level changes. Thus, algorithms (such as time senes) 
that require stationary behavior of the data will work well during the mainstage operation. During startup and 
shutdown, the sensor data show nonstationary, transient behavior, and nonlinear regression algorithms are 
more suitable. The time series, linear/nonlinear regression, and clustering algorithms described below are the 
most promising of various approaches for failure detection. 


23 




Fig. 3.1 HIGH LEVEL FAULT DETECTION 
SYSTEM STRUCTURE 
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3. 1.3.1 Time Series Algorithms - A time series represents a chronological sequence of observations o a 
particular variable. The SSME teststand data consist of time sequenced measurements of temperature, 
pressure, speed, and flowrates at various locations on the engine. The time series analysis techniques invo ve 
developing models based on the measured data to explain the behavior of the past data and to predict the 
behavior of the future data. It is important to note that the actual sensor values are not of interest to the time 
series model, but rather that the structure of the recent past data resembles that of the near future data. 

The first step in the analysis involves developing univariate models. Univariate models explain the 
behavior of a single parameter based on the past data values of that parameter. The underlying assumption in 
a univariate model is that the system is in a steady state. For example, univariate models can be developed to 
characterize engine measurements when the engine is operating at a given power level. A time senes algorithm 
will only work for stationary data, as it detects a fault by indicating the presence of a nonstationanty in the 
sensor data; anomalous behavior is indicated when the measured value of the parameter starts to diverge from 
the model predicted values. A time series algorithm will adapt to gradual trends evident in the data and thus 
will not detect those faults which are indicated by such behavior. 

The general form of a time series model is given by: 

z k = Z bj z k _j + r k - Z q r k _i ( ! ) 

i=l i=l 

Here, z k is the observation at time t k , and r k , called the residual at time t k , is an uncorrelated gaussian random 
variable. The summation limits p and q, and the parameters bj and q are adjusted to fit the data. This general 
time series model is called the mixed Autoregressive Moving Average (ARMA) Process. 

An observed time series ( zj, Z 2 , Z 3 z n ) can be thought of as a particular realization of a stochastic 

process. Stochastic processes in general can be described by an n-dimensional probability distribution. To 
infer such a general probability structure from just one realization of a stochastic process will be impossible 
unless some simplifying assumptions are made. One such assumption is the stationarity of stochastic 
processes. The stationarity condition implies that the mean, n, and the variance, V(z), of the process are 
constant and that the autocovariance 

Cov (z t , z t _ k ) = E[(z t - h) ( z t-k - h )] 

and the autocorrelation 

Covfozt-fc) (3) 

(V(z t )*V(z,_ k )]* 

depend only on the time difference, or lag, k, between the two observations. 

For a nonstationary stochastic process, a changing mean can often be described by low order 
polynomials in time. The coefficients in these polynomials are not constant, but vary randomly with time. Such 
nonstationary sequences can be transformed into stationary sequences by taking successive differences of the 
series. In case of nonstationary variance, the time series is subjected to logarithmic or power transformations. 

Another approach to simplifying the time series models is to specialize the general ARMA models to 
either Autoregressive or Moving Average models as described below: 
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Autoregressive (AR) Process.— If it is assumed that the present observation is a linear combination of past 
observations plus a gaussian random variable, then 

zit = I bj Zk.i + r k (4) 

i = 1 

In this case, the residual, r*. is the only portion of the measurement, z^, which can not be predicted from 
previous measurements. 

Moving Average (MA) Process.— If the time series is assumed to be generated by a finite linear 
combination of past and present inputs in the form of uncorrelated random variables, then the difference 
equation becomes 


Zk = 


rk- 



fk-i 


(5) 


The model always produces a stationary process. 

Time Series Algorithms for SSME Data.— The first step in time series analysis involves developing 
univariate models. The univariate models explain the behavior of a single parameter based upon the structure 
of its past values. For the SSME, the univariate models are well suited to characterize parameter behavior at a 
given power level. Parameters such as main combustion chamber pressure, fuel prebumer cavity pressure, 
turbopump inlet and discharge pressures and temperatures most often show stationary behavior during 
nominal operation at a given power level. 

Univariate ARMA models have been developed for the set of parameters listed inTable 3.1 by selecting 
training data sets of 100 points (4 secs duration). The structure of the ARMA models for different parameters 
is selected based upon five criteria: 1) a loss function based on the mean square error, 2) a prediction error 
based on Akaike’s Final Prediction Error, 3) residual analysis, 4) frequency response, and 5) pole-zero plots. 

TABLE 3.1. - SET OF PARAMETERS USED IN ARMA MODEL DEVELOPMENT. 


1 . 

LPFTDSPR 

10. 

FPB_PC 

2. 

LPOP_DS_PR 

11. 

PBP_DS_PR 

3. 

HPFP_IN_PR 

12. 

MCC-CLNT_DS_PR 


HPFP_DS_PR 

13. 

MCC_FU_INJ_PR 

5. 

HPOP_DS_PR 

14. 

MCC_PC 

6. 

HPOTINT 

15. 

HPFTDST 


HPFPINT 

16. 

OPOVACTPOS 

8. 

LPFPSPD 

17. 

LPOP_SPD 

9. 

HPFP_SPD 

18. 

HPOT_SPD 


The simplest criterion for selecting the ARMA model structure is to compute the sum of squared error 
(the Loss Function), and pick the structure with the smallest Loss Function. But if the model is validated on 
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the same data set that it was trained on, the loss function will always decrease as the model order increases. To 
compensate for the automatic decrease in the loss function, other selection criteria need to be considered 
simultaneously. The second criterion that is considered is Akaike’s Final Prediction Error (FPE) whic is 

formed as 


FPE = - - n — * (The Loss Function) 
1-n/N 


( 6 ) 


where n is the total number of estimated parameters, and N is the length of the data record. 

Another criterion in the model order selection is residue analysis. The residuals associated with the data 
and a given model ideally should be white (uncorrelated for all lags) for the model to be a correct description o 
the system. If the residual correlation functions are substantially outside the 99% confidence intervals 
established for the training data set, then the corresponding model is not a good representation of the data. 
The last two criteria compare the model properties in terms of pole-zero plots and the frequency response. A 
pole-zero cancellation or near cancellation, and high frequency artifacts in the frequency response usually 
indicate that lower order models may be more appropriate. Figures 3.2, 3.3 and 3.4 illustrate the model order 
selection process using the different criteria. 

As previously stated, for the model to be a correct description of the system, the residuals associated with 
it and the data should ideally be white; the correlation function of the residuals should remain within the 
confidence interval for lags greater than zero. Figure 3.5 shows the correlation function of residuals with 99% 
confidence intervals for low pressure fuel turbopump discharge pressure (LPFT_DS_PR) of test 901-110 at 
75% of rated power level. The univariate ARMA models have been developed by selecting training data sets of 
100 points (4 secs duration). The confidence intervals associated with the residuals widen as the number of 
data points for estimating a model is decreased. The design of a detection system would have to consider the 
trade-offs between a smaller training set and a wider confidence interval. 

At the onset of a failure, the model output and the measured data diverge, and the residual correlation 
function lies outside the confidence intervals, as shown in Figure 3.6. Figure 3.7 shows the residual correlation 
function for an entire test duration (test 901-110); the peak in the correlation coefficient indicates an abnormal 

event. 

The univariate time series models have the potential for rapid detection of changes in the parameter 
values, assuming that the parameter shows a stationary behavior before the change takes place. A fault is 
indicated by the evidence of nonstationarities in the parameter values. Oscillatory behavior, for example, is 
nonstationary. Defects which manifest themselves as oscillatory behavior in a sensor that is normally constant 
will be detected by the time series algorithm, regardless of whether or not the actual value of the sensor is 
within its nominal operating range. 

For those, parameters for example OPOV_ACT_POS and HPOT_DS_T in test 901-364 (shown in 
Figures 3.8 and 3.9), that exhibit nonstationary behavior the data needs to be subjected to a differencing 
operation to remove the nonstationarity before an ARMA model can be developed. It was observed that for 
both the OPOV_ACT_POS and HPOT_DS_T, the differencing operation removed the effects on the 
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Fig. 3.3 ARMA MODEL ORDER DETERMINATION USING RESIDUAL ANALYSIS 
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UNIT CIRCLE 

Fig. 3.4 ARMA MODEL ORDER DETERMINATION POLES AND ZEROS PLOT 
Poles and zeros of model with confidence regions. 
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Fla 3.5 RESIDUAL CORRELATION FUNCTION FOR LPFT DISCHARGE PRESSURE 
WHEN THE ARMA MODEL AND THE DATA ARE IN GOOD AGREEMENT 
(TEST 901-110). 
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CORRELATION FUNCTION OF RESIDUALS 


FAILURE MODE 



Fig. 3.6 RESIDUAL CORRELATION FUNCTION FOR LPFT DISCHARGE PRESSURE 
WHEN THE MEASURED DATA DIVERGED FROM THE ARMA MODEL 
PREDICTION (TEST 901-110). 



Fig. 3.7 FAILURE DETECTION USING ARMA MODELS FOR TEST 901-110 
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parameter values due to actual failures. For such parameters, univariate ARMA models are clearly not 
suitable for failure detection. 

3. 1.3. 2 Regression Analysis . — Regression analysis can be used to detect faults in a system which are 
indicated by significant deviations of a sensor measured value from its expected nominal value. Regression 
analysis exploits sensor-to-sensor relationships to establish a set of equations (models) to predict the 
expected value of the sensor output during system operation. Deviation of the actual measured value from the 
model estimate indicates a fault. Figure 3.10 depicts the methodology for fault detection using regression 
analysis: models are developed in a training phase; an error signal is formed by subtracting the measured value 
from the linear estimate; the detection scheme determines if the error signal indicates a fault condition. 

3.1.3.2.1 Linear Regression Analysis .— Linear regression analysis leads to the development of equations 
of the form: 


Y = aj X(l) + a 2 X(2) + ... a n X(n) (7) 

The dependent variable, Y, is estimated with a weighted linear combination of the independent 
variables, X(i). Thus, the value of one sensor, Y, is estimated by using data obtained from other sensors, X(i). 
If the estimate of a sensor value differs significantly from its measured value, a change in engine performance 
has occurred such that the initial relationship between the sensor measurements is no longer valid. In such a 
case, the engine would be considered to be operating in an abnormal manner, which could indicate the 
presence of a fault. 

The degree of association between two variables is quantified in terms of the correlation coefficient, R, 
and can be used to select the valid relationships for which the estimation equations can be formulated. Note, 
however, that the correlation coefficient should not be the sole criteria for selecting the dependent and 
independent variables, as the value of this statistical parameter contains no information regarding the 
physical operation of the system under study. 

It was observed that the CADS sensors were highly correlated (see Figures 3.11 and 3.12). These large 
correlation values could be incorrectly interpreted to mean that significant, direct physical relationships exist 
between the majority of the engine parameters monitored by the CADS sensors. Furthermore, it could be 
incorrectly interpreted to mean that these sensors should be included as independent variables in estimating 
the dependent variable to which they are correlated. Detailed analysis of the SSME test data revealed that the 
high degree of sensor-to-sensor correlation could be attributed solely to the change in power levels that the 
SSME experiences in typical ground test or flight profiles. 

The engine thrust level is proportional to the MCCJPC. Using the MCC_PC as the independent variable, 
X, the power dependency 


Y = m X + b (8) 

represents the resulting estimate for each of the sensors. The detrended time sequence for each sensor was 
produced by subtracting the estimate from the corresponding sensor data. An example of this process is 
shown in Figures 3.13 - 3.15. The MCC_PC (Figure 3.14) was used to estimate the Fuel Prebumer Oxidizer 
Valve Position (FPOV POS). The estimate was subtracted from the actual measured FPOV POS (Figure 3.13) 
to produce the difference between the measured data and its estimated value (Figure 3.15). 
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Fig. 3.10 METHODOLOGY FOR FAULT DETECTION 
USING LINEAR REGRESSION ANALYSIS 
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Fig. 3.11 HISTOGRAM OF CADS SENSOR CORRELATION 

The CADS sensors were highly correlated. 
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Fig. 3.12 SENSOR CORRELATION: 

The CADS sensor HPFP discharge pressure data 
was highly correlated with the MCC pressure. 
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Once the MCC_PC trend was removed from the data, the sensor-to-sensor correlation decreased 
significantly (Figure 3.16), indicating that most of the sensor interdependency was associated with their 
dependency on the MCC_PC. Thus, using the MCC_PC as the independent variable, linear regressive 
equations (also known as models) for the other engine parameters were derived for estimating individual 
sensor values, and thereby identifying off-nominal SSME operation at steady-state power levels. 

Linear models were developed for each sensor using data from the nominal data set 902-457. Evaluation 
of the linear regression approach by using the failure data indicated that the uncertainty of the condition of the 
engine components and build variations translated into fault indications, even though none were apparent 
The single sensor model did not account for the variability, and thus, the deviation limits selected for the mode 
resulted in either high false alarm (FA) rates or high missed detection of fault (MDF) rates. A model with 
higher limits treats all of the parameter variability as acceptable, regardless of whether the variation indicates 
a fault, and thus yields a high MDF rate. 

Thus the linear model approach was not sufficiently robust to accommodate normal engine variations. 
Consequently, linear models were not selected as viable for fault detection. The knowledge gained in exploring 
the linear models pointed to more robust, multivariate linear models that provide insight into the deviation 
from nominal operation. This technique is discussed in Section 3. 1.3.3. 

3.1. 3.2.2 Nonlinear Regression Analysis. -During the startup and shutdown phases, the SSME operates 
in an open-loop mode with time sequenced commands for the opening and closing of the propellant and 
oxidizer valves. The MCC_PC is a function of the valve positions or the propellant flow rates. An algorithm 
based on nonlinear regression analysis was used to define the relationships between the MCC_PC and the 
valve positions or the flow rates during the startup and shutdown phases. 

Description of Nonlinear Regression Algorithm (RESID). —The Recursive Structure Identification 
(RESID) algorithm is a nonlinear regression method based on the adaptive learning network concept. It 
approximates a complex nonlinear relationship between pattern features with a network of simple binary 
quadratic functions. It builds up the interconnections between different features recursively. The algorithm 
examines all pairwise quadratic combinations of features from the given feature set and builds a higher order 
nonlinear regression equation. The regression equation acts as the discriminant function and contains only 
those features that minimize, in the least-square sense, the total misclassification error. 

In order to build the network, the algorithm provides a training and selection step. The input feature set 
is partitioned into training, selection, and evaluation subsets. During the training phase, the coefficients of the 
quadratic fit are determined for all pairwise combinations of the input variables. In the selection process, 
elements with poor performance in terms of the least-square-error criterion are rejected. The remaining 
elements become inputs to the next level of training, and selection steps. These training and selection steps are 
repeated until the performance measure does not show any further improvement. The final phase, the 
evaluation phase, is where the overall performance of the network is determined. 


Let X = ( xi, X 2 ,....,xn ) be the N-dimensional feature vector and wj, W 2 ,.. be the coefficients or the 
weights of the decision function, D(X,w). Generally, D(X,w)is a nonlinear decision function. If two classes Ci, 
C 2 are separable, then the coefficients of the decision function are determined in RESID by minimizing a loss 
function based on the mean square error. A mean square error loss function can be written as. 
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Fig. 3.16 HISTOGRAM OF CADS SENSOR CORRELATION 

The correlation decreased when the variations In the FPOV 
associated with the engine power variation were removed. 


38 



J(w)= E( e 2 /w ), 

where e denotes an error measure and is given by: 
e = D(X,w) - g(X). 

The function g(X) is some function with a desired classification property. For a two-class problem, g(X) can be 
defined as: 


1 1, for X element of class Ci, 

-1, for X element of class C 2 . 

Figure 3.17 shows a two-layered RESID network. The three input variables are xj, X2, and X3, and the 
decision function is given by D(X,w). An individual network element, E, is shown in Figure 3.18. The network 
element, E, combines the two inputs xi and x 2 by a quadratic equation to give an output, yi: 

yi = w 0 + w ix 1 + w 2 x 2 + W 3 X ix 2 + W 4 X 1 2 + W 5 X 2 2 , (9) 

where wq etc. are the coefficients of the quadratic fit. 

The decision function D(X,w) is given by 

D(X,w) = azi + bz 2 + cz 3 , (10) 

where zj = f(yj, y 2 ), etc., and a,b,c are functions of the weights, wo, wj, etc. 

Application of RESID to the SSME Data. -During the startup and shutdown phases of the SSME 
operation, the controller operates in an open-loop mode with time sequenced commands to the five SSME 
valves (MFV, MOV CCV, FPOV, and OPOV). Thus, during startup and shutdown, the MCC_PC is a function 
of the five valve positions. Since these valves control the flow of propellants, MCC_PC is also a function of the 
fuel and LOX flow rates. 

The RESID algorithm was used to predict the MCC_PC as a function of propellant valve positions as 
well as propellant flow rates. Figure 3.19 shows the measured MCC_PC during startup along with its 
prediction calculated from propellant volumetric flow rates. Similarly, Figure 3.20 shows the measured 
MCC_PC during startup along with the predicted MCC_PC calculated from the propellant valve positions. 
Figures 3.21 and 3.22 show the ability to obtain similar MCC.PC predictions during shutdown. 

The process of failure detection using RESID involves running the algorithm on nominal data first to 
generate the difference between the measured and predicted values of MCC_PC during startup (the nominal 
error signal) as shown in Figure 3.23, and then to compute the error signal. A failure is detected during startup 
or shutdown in other tests if the error signal crosses a threshold established at 3 times the standard deviation 
for the nominal error signal (see Figures 3.24 and 3.25). 

Failures occurred during startup or shutdown in three instances (startup failures: 901-284 and 901-222; 
shutdown failure: 750-168). The RESID failure detection algorithm incorporating a model of MCC_PC as a 
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Fig. 3.18 NETWORK ELEMENTS 
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Fid 3.19 PREDICTED MCC PRESSURE AS A FUNCTION OF PROPELLANT 
FLOW RATES DURING STARTUP (TEST 902-463). 



Fig. 


3 20 PREDICTED MCC PRESSURE AS A FUNCTION OF PROPELLANT 
VALVE POSITIONS DURING STARTUP (TEST 902-463). 


41 




PRESSURE (psla) PRESSURE (psla) 


3500 



Fig. 3.21 PREDICTED MCC PRESSURE AS A FUNCTION OF PROPELLANT 
FLOW RATES DURING SHUTDOWN (TEST 902-463). 



Fig. 3.22 PREDICTED MCC PRESSURE AS A FUNCTION OF PROPELLANT 
VALVE POSITIONS DURING SHUTDOWN (TEST 902-463). 
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Fia 3 24 NONLINEAR REGRESSION METHOD: DETECTION OF FAILURE 
DURING STARTUP 

Error in predicted MCC pressure crossing threshold indicates failure. 
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Fig. 3.25 NONLINEAR REGRESSION METHOD DETECTION OF FAILURE 
DURING SHUTDOWN 

Error in MCC pressure crossing threshold indicates (allure. 





function of propellant flows was implemented. The algorithm successfully detected the startup and shutdown 
failures, and no false alarms were indicated with respect to the two nominal tests. Also with respect to those 
tests in which the failure incidence occurred during mainstage operation, there were no false alarms indicated 
(during startup) by this algorithm. 

3. 1.3.3 Cluster Analysis. —The detection of faults in the SSME can be viewed as a Pattern Recognition 
problem: failure detection consists of distinguishing patterns in the SSME test data which are associated with 
abnormal engine operation from those associated with nominal operation. 

Pattern Recognition techniques encompass a variety of methods for recognizing patterns within a data 
sample which are associated with some phenomena (objects or processes measurable with a sensor suite). The 
majority of these pattern recognition algorithms are based on two fundamental procedures: a generic 
characterization of data patterns generated by each phenomenon to be recognized; and a recognition process 
in which a data pattern is compared to the generic characterization, and specific phenomena are declared to 
be present. How the characterization is defined and how the comparison is performed depend upon the 
technique used. In the following sections, these issues are discussed for the pattern recognition technique 
called clustering, first in general terms, and then with respect to fault detection in the SSME. 

3. 1.3.3. 1 Clustering. —The clustering technique uses groups of multivariate data vectors to develop the 
generic patterns. Empirical data sources, with features derived from observations such as temperature and 
pressure, provide samples for these data vectors. The vectors are grouped in clusters according to their 
similarity. These clusters, in turn, are associated with the phenomena to be recognized. 

In general, not all of the information in the cluster data vectors is required to characterize a generic 
pattern. Therefore, a template, or composite vector, which exhibits maximum similarity with the cluster’s data 
vectors, should be developed for the individual clusters. One commonly used template is the mean data vector, 
it contains the mean value of the each variable defined by the sample data vectors. These templates represent, 
in a statistical sense, the significant information in the cluster samples. 

Once the patterns of interest have been characterized with a set of templates, the recognition process 
consists of comparing a test sample to each of the templates and deciding which template matches the 
unknown. The best match identifies the unknown as belonging to the phenomena associated with that 
template; a match between an observation and a template implies the presence of specific phenomena. 

3.1. 3.3. 2 Application to SSME. — UTRC applied clustering to the problem of fault detection by assuming 
the engine operates in two states: a nominal state, where the engine exhibits no performance degradation 
associated with a fault; and an off-nominal state, where the engine operates in any state not considered 
nominal. A fault detection algorithm, based on the clustering technique which detects a fault by mapping the 
CADS data into fault/no fault classes. This algorithm has three major components: a detection system 
database, a function for training and retraining the detection system, and a function for performing fault 
detection (see Figure 3.26). 

Detection System Database.— The algorithm database contains the data required for training and 
detection processing. The parameters associated with training include: nominal data sets for establishing 
detection thresholds, the master nominal template, and a list of the active sensors. The parameters associated 
with detection include the test detection thresholds and the modified nominal template . Each of these 
parameters will be discussed more fully in the following sections. 
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FIG. 3.26 CLUSTERING TECHNIQUE FAULT DETECTION ALGORITHM 
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Detection System Training. -Training the detection system involves developing the cluster templates and 
establishing the detection thresholds. Both will be discussed in the following paragraphs. 

Template Development. —Templates represent the nominal characteristics of a cluster and are the 
patterns to which current test data is compared for fault detection. The cluster templates are established 
through the training process. Initial training for the clustering algorithm is performed off line using data taken 
from previous SSME firings. The three principal steps in the training process are: the selection of nominal data 
sets for a training database; the definition, extraction, and clustering of the multivariate data vectors from the 
database; and the creation of a template for each cluster. 

Nominal Database. -The first step in training is the development of the sample population from which 
the templates are derived. Since the detection problem requires only nominal templates, the sample 
pop ula tion consisted of the two nominal firings: tests 902-457 and 902-463. As explained in Section 2.1.1, the 
two nominal tests contain both CADS and Facility data. However, since facility data was not available on-site 
for many of the failures, only the CADS data was retained in the database and used for template development. 

The CADS data provides parametric information on multiple SSME LRU’s. Initially, the development 
of the templates was restricted to the problem of detecting faults in the HPFTP. Algorithm coverage for the 
other LRU’s would follow in a similar manner. Hence, a subset of the CADS data which provided coverage of 
the HPFTP and its inputs and outputs has been selected. The list of sensors and their PID numbers is shown 
in Table 3.2a. 


TABLE 3.2a - CLUSTER ANALYSIS SELECTED SENSORS 


Order 

PID No. 

CADS Label 

1 . 

32 

LPFP Speed A 

2. 

225 

LPFP Discharge Temperature A 

3. 

226 

LPFP Discharge Temperature B 

4. 

52 

HPFP Discharge Pressure A 

5. 

58 

FUEL Prebumer Pressure A 

6. 

260 

HPFP Speed A 

7. 

261 

HPFP Speed B 

8. 

231 

HPFT Discharge Temperature A 

9. 

232 

HPFT Discharge Temperature A 

10. 

24 

MCC Hot Gas Injector Pressure A 

11. 

17 

MCC Coolant Discharge Pressure A 

12. 

18 

MCC Coolant Discharge Temperature B 

13. 

59 

FUEL Prebumer Pump Discharge Pressure 


Data Normalization. —Clustering is based on a measure of similarity between multiple variables. Thus it 
is necessary to normalize the data in some manner to allow equal sensitivity to each of the sensors, whether 
they measure temperature, speed, or pressure. Data normalization is typically achieved by using population 
parameters, such as the mean and variance of a sensor signal. For this application, data normalization was 
complicated by the fact that the data did not appear to be distributed in a manner characterized by a typical 
distribution function and, further, because the nominal data sample was insufficient to characterize the 
population. 
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It was found, however, that normalization of the sensors could be achieved by using the PBM estimates of 
each CADS sensor in conjunction with the CADS data. A data vector, d(i,t), composed of the multiple sensors 
listed in Table 3.2a, was normalized at each time sample, t, using the following equation: 

_ PBM ParameterEstimate (i, Power level (t)) - sensor value (i, t) 

PBM Parameter Estimate (i, Power level (t)) 

i - sensor index 

t - sample was collected at time t 
This equation can be restated in the form: 


d(i,t) = 1 - 


sensor value (i,t) 

PBM Parameter Estimate (i. Power level (t)) 


( 12 ) 


The PBM Parameter Estimates for each selected sensor were determined by extracting values at seven 
rated SSME power levels (65%, 70%, 80%, 90%, 100%, and 104% RPL) from the PBM output, and fitting a 
polynomial to these values. The polynomial coefficients for the parameters are listed in Table 3.2b, and were 
utilized in the cluster algorithm as follows: 

PBM Parameter Estimate (i, PL (t)) = cj PL 3 (t) + C 2 PL^t) + C 3 PL(t) + C 4 


where i = sensor index 

PL (t) = power level (%RPL) at time t. 


TABLE 3.2b - COEFFICIENTS FOR PBM PARAMETER ESTIMATE POLYNOMIALS 


POLYNOMIAL COEFFICIENTS 

PID No. 

Ci 

C2 

c 3 

c 4 

32 

0 

6.2331e-01 

-4.0070e + 01 

1.3357e + 04 

225 

1.1072e-05 

-2.5400e-03 

1.9834e-01 

3.7197e + 01 

226 

1.1072e-05 

1.9834e-01 

1.9834e-01 

3.7197e + 01 

52 

1.0601e-02 

-2.4529e + 00 

2.4773e+02 

-4.7743e + 03 

58 

2.2294e-03 

-4.0778e-01 

7.9094e + 01 

-1.1197e + 03 

260 

0 

0 

2.1157e+Q2 

1.3126e + 04 

261 

0 

0 

21157e+02 

1.3126e + 04 

231 

3.8781e-03 

-9.33863-01 

7.8306e+01 

-7.2104e + 02 

232 

3.8781e-03 

-9.3386e-01 

7.8306e + 01 

-7.2104e + 02 

24 

0 

0 

33214e+01 

-9.8566e + 01 

17 

6.3262e-03 

-1.4614e + 00 

1.5783e + 02 

-2.8293e + 03 

18 

0 

0 

-1.2553e-01 

4.7722e + 02 

59 

0 

0 

8.3784e + 01 

-1.3382e + 03 
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An example of a normalized data vector is shown in Figure 3.27. This is a bar plot of the d(i,t) vector for 
all thirteen sensors at one sample point in time. The x-axis is the PID number for a CADS sensor from Table 
3.2a, and the y axis is the value of d(i,t) for each PID. 

Nominal Template Development. —The normalized data vectors, d(i,t), were used to define a template for 
nominal operation of the SSME. The d(i,t) vectors were computed for mainstage operation of the nominal test 
902-463. The data was then clustered by calculating the correlation coefficient between each sample vector, 
and grouping those with a correlation value greater than 0.95 into a cluster. A template was created from the 
cluster by averaging the d(i,t) vectors of the cluster over the time index. The resultant template is shown in 

Figure 3.28. 

The nominal template shown in Figure 3.28 was tested to determine its degree of similarity to the two 
nominal data sets. The similarity was quantitatively measured by the correlation between the template and the 
sample data vectors. Figure 3.29 shows the correlation between each d(i,t) vector of nominal test 902-463 and 
the nominal template. As seen in the plot, the correlation is approximately 1 for most of the run. The exceptions 
are during rapid power transients and during operation at 65% RPL, where the correlation falls to 0.95. 

Similarly for nominal test 902-457, the correlation between each sample vector and the nominal template 
was computed. The results are plotted in Figure 3.30. The correlation remained greater than 0.95 for the 
majority of the data except, again, during the rapid power transients. The template was thus considered to be 
an adequate characterization of nominal engine operation based on this limited data set. 

Detection Thresholds.— Two detection thresholds must be established prior to a test: an event detection 
threshold, and a fault detection threshold. The event detection threshold is required by the algorithm to 
decide whether an observation, d(i,t), matches the nominal template, while the fault detection threshold is 
required to determine if a significant number of event detections have occurred for a fault indication. 

The event detection threshold is calculated by the steps shown in Figure 3.31. First, the list of valid PIDs 
for the test is read from the Detection System Database and used to create a modified nominal template. The 
modified template is derived from the original template by removing those PIDs not contained in the list of 
valid PIDs for the specific test. Next, the correlation coefficients are calculated between the modified template 
and the 902-463 nominal test data. A histogram of the correlation coefficients is then computed and the 
correlation value for the 1st percentile is determined. This correlation value is multiplied by a scale factor of 
0.95 and saved as the event threshold. The scaling by 0.95 was used to adjust the threshold, thereby, reducing 
the chance of a false alarm. 

The event threshold is used to make the initial decision that a potential fault exists. The final decision is 
made using the fault detection threshold. The fault detection threshold is set to minimize the probability of a 
false alarm. Since the data populations are unknown, the fault detection threshold was selected using a false 
alarm probability derived from the 902-463 nominal data. An m out of n detector was used for fault detection. 
Based on the false alarm probability of test 902-463, m was set to 5 and n was set to 5. These values remained 
fixed for all further clustering algorithm tests. 

Fault Detection. - Prior to the initiation of an engine test, the training module defines the template and 
the detection thresholds for the fault detection system. At runtime, the detection module processes the CADS 
data to detect engine faults. The fault detection module, shown in Figure 3.32, is composed of the functions for 
template/data correlation, event detection, and fault detection. 
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Fig. 3.29 CORRELATION BETWEEN NOMINAL TEMPLATE AND 
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Fig. 3.30 CORRELATION BETWEEN NOMINAL TEMPLATE AND 
TEST 902-457 NOMINAL DATA. 
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FIG. 3.31 CALCULATION OF EVENT DETECTION THRESHOLD 



Fig. 3.32 FAULT DETECTION PROCESSING 
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The CADS data was first normalized by the procedure described above, then the correlation function 
performed a correlation between the CADS data and the current template for each time sample. The function 
output was a correlation value between 1 and -1 for each time sample. 

The correlation values and the event detection threshold were input to the event detection function which 
compared the correlation value at each time sample to the event threshold. If the correlation value for the 
current sample exceeded the threshold values, an event is declared by outputting a 1. If the correlation value is 
within the thresholds, the detector outputs a 0. 

The event detections were counted in the fault detector as they were outputted from the event detector. 
The fault detector counted the events sequentially for 5 samples. A fault was declared if the sum of the event 
counts exceeded the fault detection threshold of 5 events. The output of the fault detector was a 1 when the 
threshold was exceeded, and a 0 otherwise. 

Factors Affecting Algorithm Performance — Empirically derived algorithms, such as clustering, achieve 
their optimal performance when operated within the constraints used to develop them. For the clustering 
algorithm, these constraints include the sensor set selected for input, the PBM estimators, and the use of the 
correlation value as a measure of the pattern similarity. Changes in any of these constraints may affect the fault 
detection performance of the algorithm. Three major factors which affect the clustering performance are 
currently under study by UTRC and NASA-LeRC. 

The first factor affecting the clustering performance is the makeup of the sensor set selected for input to 
the algorithm. The sensor set shown in Table 3.2a was selected to provide the maximum physical coverage of 
the HPFTP with the sensors available for the nominal tests. The loss of certain individual sensors affects the 
performance of the algorithm more than other sensors, as observed in the test results given in in Appendix A. 
Efforts are underway to establish the sensitivity of each individual sensor to the fault detection performance of 
the algorithm. 

The second factor affecting the algorithm performance is the equations used to estimate the PBM values 
and the normalization of the data based on those PBM values. The ratio of the measured sensor value to its 
PBM estimated value (Eq. 12) determines the magnitude of the deviations for a given template. As this ratio 
approaches one, small components of the sensor signal, which otherwise would not affect the algorithm, begin 
to dominate. A better estimator of nominal engine operation or a change in the engine such that it operates 
closer to its predicted state are two examples of conditions which will cause the normalization ratio to 
approach unity. Once this condition exists, factors such as sensor signal noise and errors in the PBM estimator 
due to engine power level changes tend to dominate the resultant of Eq. 12, and thus, produce erratic results for 
the fault detection. Furthermore, it can be shown that the PBM estimator in Eq. 12 acts as a weighting function 
which emphasizes each parameter according to its accuracy of estimation. In this manner, certain parameters 
may be unintentionally weighted to have more of an effect on the clustering algorithm (see Sections A-10 and 
A-13). Therefore, it is important that the PBM estimation coefficients listed in Tkble 3.2b be used for the 
clustering algorithm to achieve successful results. UTRC and NASA-LeRC are studying the weighting 
function of the PBM estimator to improve algorithm performance and robustness. One potential 
improvement is the selection of a weighting value for each parameter based on an engineering judgement 
rather than an arbitrary value. 

The third factor which affects the clustering algorithm performance is the use of correlation to compare 
the test template with the nominal template. As stated previously, this technique may produce erratic results 
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when the normalization ratio approaches one. UTRC and NASA-LeRC are investigating other similarity 
measures which may be more appropriate. 

3.1.4 Summary of Failure Detection Algorithms.— The objective of this task was to identify and evaluate 
fault detection algorithms that meet the HMS program goals for incorporation into the HMS framework 
architecture. According to program requirements, an algorithm performance evaluation criterion was 
established: failure detection algorithm performance was evaluated based upon its probability of failure 
detection; probability of false alarms; and time of detection. 

A data-driven approach to the algorithm development process was taken because of inadequately 
defined fault characteristics which precluded the definition of precise analytical models of failure modes. The 
lack of analytical programs for fault modeling, and the availability of a large SSME database of nominal and 
failure data also contributed to the decision to use empirical methods. Furthermore, empirical methods are 
more suitable for fault detection in the HMS real-time environment. Analysis showed that faults manifest 
themselves in the SSME data as long duration trends, quick transitions, or oscillatory variations; each must be 
detected by the failure detection scheme. Finally, in lieu of sufficient nominal data, information from the 
SSME analytical models (PBM and DTM) was incorporated into the fault detection algorithms. 

The HMS failure detection algorithms developed by UTRC successfully cover all modes of SSME 
operation. A nonlinear regression algorithm (RESID), which exploits the nonlinear relationships between 
engine parameters, was used to detect failures during the open-loop startup and shutdown modes. Fault 
detection during SSME mainstage operation was covered by both time series analysis and cluster analysis. 
The time series ARMA models use the behavior of past data to predict the behavior of future data and are 
capable of detecting rapid or oscillatory failures during mainstage. Cluster analysis utilizes the pattern of 
differences between measured and design point data to detect gradual, slow trend failures, as well as rapid 
failures. 

The UTRC failure detection algorithms were run on test data from a total of 16 failure incidences and 2 
nominal tests. The individual algorithms, when used with a complete sensor set, had no false alarms when 
tested on nominal data. Although the algorithms are generally robust to sensor loss, the results in Appendix A 
show that the clustering algorithm, in three cases, was sensitive during power transitions to the loss of certain 
sensors. Further study of this sensitivity is currently proceeding. Table 3.3 presents the failure detection times 
for the time series ARMA, RESID, and clustering algorithms. For each test, the UTRC HMS algorithm 
detection times are compared to those from SAFD and redline cutoff. The number of sensors missing for each 
test are also indicated. 

The failure detection times were earlier than the redline cutoff times and the SAFD detection times 
except in cases of structural failures, where there were no prior indications. In most cases, the failures were 
detected early enough to allow for a normal engine shutdown. The UTRC HMS failure detection scheme is 
effective because it does not rely on a single algorithm or a single sensor measurement. The ARMA and 
clustering algorithms provide double coverage during mainstage operation, and have proven to be robust to 
sensor loss. Furthermore, the fact that the algorithms detected failures on data which covered an engine 
development and test period from 1977 to 1989, demonstrates that they are capable of handling engine 
build-to-build variations. 

3.1.5 Failure Detection System.— The algorithms described above have shown their individual capabilities 
to detect failures within the engine. A simplistic failure detection system such as the one shown in Figure 3.33 
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Table 3.3 ALGORITHM PERFORMANCE - DETECTION TIMES 


Test No. 

SENSORS 

UTRC HMS 

SAFD 

RED-LINE 

MISSING 

Non-Linear 

Clutter 

ARMA 

901*1 10 

4 

N/A 

Mlttlng 
PC Data 

16.0 

N/A 

74.1 

901*436 

0 

N/A 

302.4 

70.0 

N/A 

611.0 

901-364 

1 

N/A 

42.7 

210.0 

216.7 

392.2 

901-307 

3 

N/A 

8.6 

9.0 

55.5 

75.0 

902*198 

0 

N/A 

5.6 

8.5 


8.5 

902-249 

1 

N/A 

5.2 

160.0 

388.2 

450.6 

901-225 

2 

N/A 

255.6 

16.0 

255.6 

255.6 

750-168 

1 

300.2 

300.2 

N/A 

N/A 

300.2 

901-284 

5 

3.9 

5.2 

9.0 

5.2 

9J 

750-259 

1 

N/A 

101.5 

101.5 

N/A 

101.5 

901-173 

6 

N/A 

102.1 

188.0 

188.9 

201.2 

901*331 

4 

N/A 

50.2 

233.0 

N/A 

233.1 

901-222 

2 

4.3 

N/A 

N/A 

N/A 

4.3 

901*340 

4 

N/A 

405.5 

12.2 

12.2 

405.5 

SF10-01 

9 

N/A 

N/A 

104.8 

N/A 

104.8 

SF6-01 


* - Corrupted D 

ata 

- 

N/A 

18.6 


Detection System Output <0 no fault, 1 - fault) 



Sensor • • • Sensor Sensor. . .Sensor Sensor. . .Sensor 

1 NIK 1 J 


Fig. 3.33 SINGLE LEVEL FAULT DETECTION SYSTEM 
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could be designed to provide detection of the engine failures. The system would consist of parallel detection 
algorithms whose inputs are single or multiple CADS sensors, and whose output would be binary fault/no fault 
decisions. Each output would have equal opportunity to shutdown the engine. Such a system is simple and fast 
due to its parallel decision making and lack of vertical integration. However, the system is prone to a higher 
false alarm probability and a lower detection probability. Introduction of some vertical integration to the 
design will reduce these two fallacies, and only slightly degrade the system speed. 

The detection system shown in Figure 3.34 is a hierarchical design which will provide fast and accurate 
detection of engine failures. The hierarchical design inputs the data as previously described in the single level 
scheme, but the outputs of the detection modules (ARMA, RESID, CLUSTER) are now integrated in the next 
higher level of the system to assess the health of the individual LRUs. This level then outputs its results to the 
next level for integration and an assessment of the overall engine status. At any point in the vertical direction, 
the output could be used to make a fault/no fault decision. However, a more robust detection methodology is 
achieved by using further integration. Further details of this design are given in Section 4.1. 

3.2 Advanced Technology Sensors 

The fault detection performance of an HMS for virtually any mechanical system is directly related to the 
quality of the information provided by sensors which monitor the system. A limitation common to retrofit 
installations of HM systems is that the existing set of sensors is typically directed at control functions, rather 
than health monitoring. The main purpose of the SSME sensor suite is to provide the ability to assess the 
performance of the engine, while providing information to the controller so that the thrust and mixture ratio 
can be controlled. Gas turbine and rocket engine controllers typically have a frequency response of a few hertz, 
and therefore, require the sensor signal to be conditioned with low pass filters so that the high frequency signal 
components are removed. Fault precursor information which is indicated by high frequency fluctuations in 
the sensor signal is therefore lost due to this conditioning. The ability of the HMS to access the raw, unfiltered 
sensor signal before it is processed by the controller may provide improved fault detection capability. 

The performance of an HMS can also be dramatically improved through the use of sensors specifically 
directed at component health monitoring. This is illustrated in Figure 3.35 as a graph of detected flaw size 
versus fault detection time. Sensors which are directed at health assessment rather than performance can 
detect smaller flaws, which translates into an earlier time of detection. The decrease in time to detection 
provides more flexibility in the actions that the HMS can take in response to the presence of the flaw. Die 
penalty paid for this improved performance is that the component health assessment sensors are usually not 
easily retrofitted to the engine. 

The fault detection algorithms presented in the previous section have demonstrated excellent 
performance in nearly all cases tested using the existing CADS sensor information. These fault detection 
methodologies form the core of the HMS. Further improvements and enhancements can be achieved through 
the incorporation of new sensor information derived from both existing and new sensors on the SSME. As 
part of this program, UTRC evaluated the existing set of SSME sensors to assess the potential for extracting 
more information from the sensor signal if that signal was available unconditioned by the controller. 
Additionally, new advanced technology sensors were evaluated for potential incorporation in the HMS 
provided that they were nonintrusive and would be available for ground testing within 5 years. 

3.2.1 Existing SSME CADS Sensors.— As previously discussed, the existing SSME CADS sensor suite 
consists of pressure, temperature, flow, and position sensors directed at monitoring engine performance and 
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providing control information. The temperature sensors have an installed frequency response of 1 to 10 Hz, as 
stated in the Block II controller specification. The pressure sensors have a 100 Hz minimum frequency 
response. The accelerometers for the Flight Accelerometer Safety Cutoff System (FASCOS) system have 
minimum responses which range from 10 to 100 kHz. The controller conditions all pressure and temperature 
signals with a 21 ± 9 Hz low pass filter before the analog-to-digital conversion occurs at 25 Hz. The 
accelerometer signals are bandpass filtered between 50 and 800 Hz before energy computations are made. 
Thus, the filtering by the controller removes any potential high frequency information from the sensor signals 
that may indicate the onset of a fault within the SSME. This is especially true for the pressure sensors and 
accelerometers, which have frequency responses well above 21 Hz. 

Although no documented study has been done to date for liquid rocket engines, an example of high 
frequency pressure fluctuations providing fault precursor information can be found in gas turbine engines. A 
phenomenon known as rotating stall can occur in gas turbine engines at certain points in the aircraft flight 
envelope. This condition causes the individual compressor blades to stall or lose their lift resulting in a loss of 
airflow through the engine. Rotating stall, as indicated by its name, is not a static condition. The stall is 
exhibited by a pressure wave that rotates around the compressor. UTRC and P&W are investigating methods 
to detect the onset of this phenomena through analysis of the signals from pressure sensors located in the 
compressor section. Efforts to date have revealed that significant precursor information is exhibited in 
pressure signals with a 1 kHz bandwidth. Filtering the pressure signal with a 21 Hz lowpass filter prevented the 
onset of the stall to be detected with the pressure information. Similarly, faults in the SSME which are 
indicated by high frequency pressure fluctuations may be detected through analysis of the unprocessed 
pressure transducer signals. 

The UTRC HMS framework will thus include the capability for collecting and analyzing raw, 
unprocessed sensor information. Obtaining the raw CADS sensor signals will require interface electronics 
that tap into the sensor-to-controller signal lines. There are several techniques to accomplish this. The major 
issue will be maintaining the integrity of the sensor signals to the controller. The potential that a failure in 
either the interface electronics or any other part of the HMS could interrupt the information to the SSME 
controller must be eliminated. As will be shown in the implementation plan, the utility of the high frequency 
information as a diagnostic tool will be first demonstrated on facility sensors which have a less strict data 
integrity requirement. Once the technique has been demonstrated on the facility sensors as reliable, a similar 
approach can be taken with the CADS sensors. 

The processing algorithms that will be required to extract the fault precursor information will not be 
known in specific detail until such data is available for study. It is anticipated that time series analyses (such as 
the ARMA technique previously discussed) will be appropriate for detecting the change in the signal structure 
which indicates an anomaly. Spectral estimation techniques such as the Fast Fourier Transform (FFT) may 
also be used to identify characteristic frequency patterns which may identify the fluctuations in the sensor 
signal. The HMS breadboard design will include the capability to perform time series and spectral estimation 
on the raw sensor signals from selected CADS and Facility sensors. 

3.2.2 Near-Term Advanced Technology Sensors .— Numerous sensing technologies directed at developing 
dedicated diagnostic sensors for certain rocket engine components are being developed in government, 
university, and commercially sponsored programs. The goal of this portion of the program was to recommend 
advanced technology sensors for incorporation in the HMS framework to improve its fault detection 
performance. As part of the SSME-ATD program, the ALS Rocket Engine Condition Monitoring program, 
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and UTC sponsored internal research programs, UTRC and P&W have been assessing numerous sensor 
technologies for incorporation in rocket engine and gas turbine systems. A list of these technologies whic 
encompasses many industry, government and university programs for potential rocket engine applications is 
shown in Tkble 3.4. A number of these programs are directed at combustion diagnostics which extract 
information from the exhaust plume of the engine. Others are directed at providing diagnostic information 
about specific rocket engine components or subcomponents, such as turbopumps and bearings. The sensor 
technologies include optical methods as well as advanced pressure, temperature, and leak detection 
transducers which, when coupled with signal processing techniques, provide unique information for rocket 
engine component health monitoring. 

Each of the sensors listed in Table 3.4 is at a different level of maturity, has different requirements for 
signal processing, and differs in the extent of modification to the SSME required for its installation. Following 
the program guidelines to assess nonintrusive, near-term technologies along with those technologies being 
demonstrated under the SSME-ATD program, UTRC narrowed the list of potential candidates for inclusion 
in the HMS to those listed in Table 3.5 based on the following four additional criteria: 

1. Nonintrusive in SSME application (except if part of SSME-ATD program) 

2. Capable of ground testing within 5 years 

3. Capable of real time operation during engine operation 

4. Provides specific diagnostic information or health assessment of a component or components 

The inclusion of intrusive SSME-ATD sensors was based on the possibility that these sensors, once qualified, 
may be placed on the production SSME-ATD turbopumps. This sensor information would be utilized by the 
HMS only when an SSME with a P&W turbopump was being operated. The fourth requirement was included 
so that the recommended sensors would provide a direct assessment of the health of specific components of 
the SSME, as opposed to simply a performance measurement made by a better, new-technology sensor. The 
following sections provide a brief description of these sensing technologies. 

3.2.2.1 Plume Spectroscopy. -Plume spectroscopy is a comprehensive sensing technology being used to 
identify and quantify the spectral features observed in the SSME exhaust plume. Both normal and anomalous 
component wear will be evidenced by combusting particles and vapors in the engine plume. The plume 
spectrometer detects ultraviolet, visible, and infrared (0.2-1.5 microns) emission and absorption of ionized 
species within the SSME plume. The spectral lines of these species can be correlated to internal engine erosion 
and degradation. This technology has been demonstrated by NASA and Rocketdyne to be useful for detecting 
SSME bearing cage failure and injector erosion. 

An Optical Plume Anomaly Detection System (OPADS), developed by Sverdrup under the direction of 
NASA-MSFC, is currently available for ground test use at SSC [2]. This nonintrusive optical detection system 
will greatly enhance detection of failures due to engine and component wear. The spectra of normal plumes 
and those peculiar to verified engine anomalies are being analyzed and characterized. A database will be 
developed through testing to identify those intensity patterns which correlate with faults. The intensity of the 
spectral lines apparent in the SSME plume will show the presence of species indicative of engine wear and 
erosion. Figure 3.36a depicts Sodium (Na) and Potassium (K) emission in a LOX/Methane plume spectra, 
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Table 3.4 CANDIDATE SENSOR TECHNOLOGIES FOR SSME HMS 


1. Acoustic Emission 

2. Optical Pyrometer 

3. Surface Layer Activation 

4. Plume Spectroscopy 

5. Raman Temperature Profiling 

6. CARS 

7. Fiber Optic Deflectometer 

8. Laser Vibration Sensor 

9. Mass Spectrometer Leak Detection 

10. Infrared Absorption Leak Detection 

1 1 . Solid State Leak Sensors 

12. Microwave Turbine Blade Clearance Sensor 

13. Cryogenic Mass Flowmeter (Vortex Shedding) 

14. Microelectronic Pressure Sensor 

15. Thin film Thermocouples and Strain Gauges 

16. IR Gas Pyrometer 

17. Ultrasonic Flowmeter 

18. Plume Electrical Diagnostics 

19. Plume Specie Concentration, Velocity, and 
Temperature Mapping 

20. Laser Anemometry 

21. Twin Core Fiber Optic Strain and Temperature 
Measurement 

22. High Temperature Heat Flux Sensor 

23. Nonlntrusive Turbopump Speed Sensor 

24. Flame Ionization Detector 

25. Holographic Leak Detection 

26. Emission Intensity Distribution Spectroscopy 

27. Inductive Debris Monitor 

28. Integrated Optic Pressure Sensor 

29. Capacitive Turbine Blade Temperature Sensor 

30. Laser Turbine Blade Tip Clearance Sensor 

31. Polyvinylidene Flouride Sensor 


Table 3.5 CANDIDATE ADVANCED TECHNOLOGY SENSORS THAT 
CAN SIGNIFICANTLY IMPROVE THE PERFORMANCE OF 
A NEAR TERM HMS 


Sensor 

Faults 

Relative 

Computational 

Req'mts 

Plume Spectroscopy 

Erosion/Wear 

Low 

Acoustic Emission (ATD) 

Bearing Faults 

High 

Optical Pyrometer (ATD) 

Turbine Blade Faults 

Medium 

Polyvinylidene Flouride Sensor 

Leaks, Burn Through 

Low 

Solid State Leak Sensors 

Leaks 

Low 

Plume Electrical Diagnostics 

Erosion/Wear 

High 

Fiber Optic Deflectometer 

Bearing Faults 

High 

Laser Vibration Sensor (ATD) 

Bearing Faults 

High 
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while Figure 3.36b shows a typical mid-IR Spectrum for the SSME plume. It is anticipated that the technology 
for maturing this technique to a useful real-time diagnostic monitor will be supported by existing NASA 
programs. 

The relative computational requirements for the HMS will be low, as the output of the plume 
spectrometer ’sensor’ will be the intensity of preselected spectral lines which correspond to species of interest. 
Fault detection signal processing will consist of individual specie redlines in combination with more complex 
pattern recognition and correlation analyses very similar to the clustering analyses presented in Section 3. 1.3.3. 

3.2.2. 2 Acoustic Emission Bearing Diagnostics. — Acoustic Emission (AE) sensors monitor high 
frequency stress waves which result from the interaction of bearing components. As the frequency range (100 
kHz - 1 MHz) monitored is well above the low frequency noise generated by gears, seals, and fluid flows, the 
AE monitoring has demonstrated earlier and more quantitative detection of bearing degradation and faults 
than that obtained from analysis of accelerometer data. 

AE monitoring of bearings is under development at UTRC, and is being demonstrated as part of the 
SSME-ATD program as a health assessment technique for the turbopump ball and roller bearings[3]. The 
patented UTC Point Contact Transducer (PCT) was modified to withstand the cryogenic temperatures, high 
pressures, and LOX environment encountered in the turbopumps. Currently, this technique is nearing the end 
of the bearing rig demonstration phase of the program, and will be incorporated into several of the design 
verification SSME-ATD turbopumps. The sensor requires contact with the component to be monitored and 
is therefore intrusive. However, efforts are underway to develop the technique for sensors which are mounted 
external to the bearing race, either within the turbopump housing or on the exterior of the housing. 

Bearing rig tests which simulate the load, speed, and cryogenic environment of the turbopumps have 
demonstrated the outstanding capability of this device to provide bearing health information. Subtle defects 
such as roller element instability and cage rubbing have been detected prior to any indication in the signals 
from internal accelerometers mounted on the bearing support. Figures 3.37 and 3.38 illustrate the acoustic 
emission signatures for stable and unstable roller elements. 

Bearing health features are extracted from the AE signal time and frequency domains and analyzed by 
correlation and pattern recognition techniques to identify the state of operation of the bearing. The high 
frequency AE signal requires more complex hardware and signal processing software than what is typically 
used for vibration monitoring. Efforts are being made to reduce these requirements by implementing 
specialized analog preprocessing of the AE signal, and thus facilitate real-time operation. 

3.2.13 Optical Pyrometer. — An optical pyrometer is an advanced technology sensor which will be used to 
monitor turbine blade health. Fiberoptic probes with indium-gallium-arsenide (InGaAs) detectors measure 
the radiant energy from turbine blades during engine operation to provide a linear map of blade temperature 
from root to tip. 

An Optical Pyrometer is being developed as part of the SSME-ATD program for the HPFTP[4], The 
pyrometer probe takes five radial measurements from the turbine blade root to tip as the blade passes. The 
probe is designed to collect a fixed percentage of the radiant energy emitted from the surface of the blade. The 
radiant energy emitted from the blade surface is transmitted through the fiber optic cable and converted by 
the external detector to a temperature measurement to provide a temperature profile of the turbine blades 
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during engine operation. Blade temperature is indicative of its condition: as a blade cracks, its ability to 
conduct heat away is reduced and consequently, the blade becomes hotter. This information about the 
operating conditions of the turbopump can be utilized in fault detection algorithms and provide specific 
diagnostic information about turbine health. 

The likely use of this sensor for real-time monitoring will be to detect hot spots on the turbine blades. 
The high frequency bandwidth of the sensor signal (500 kHz) requires both analog and digital signal 
processing to obtain the temperature measurements. The detection of hot spots is computationally simple and 
would thus impose little additional burden on the HMS. The complexity of the processing increases as the 
amount of the information about the turbine blade temperature map increases. 

3.2.14 Polyvinylidene Fluoride Sensor.— Polyvinylidene Fluoride (PVDF) is a synthetic polymer film 
which exhibits piezoelectric and pyroelectric properties. Its characteristic large and durable dipole 
polarization varies linearly with applied stresses such as electric fields, mechanical stress, and temperature 
changes.[5] This allows it to be used to provide electrical signals to monitor mechanical and thermal stresses. 

PVDF film is being investigated as an advanced technology sensor for burn through detection 
applications in the ALS-RECMS program. A blanket of this material would either be affixed to the 
component of interest, or be attached or imbedded in thermal insulations or shields. A hot gas leak would 
bum through the film or create a localized hot spot. A voltage signal would be generated by the film which 
would be unique to the type and location of the bum through. 

The PVDF sensor as a bum through leak detector would require minimal signal processing. Simple 
analog noise reduction techniques followed by thresholding would be used to detect an event. This sensor has 
been demonstrated on MX missile transport canisters as a means to detect canister penetration caused by 
various methods including projectiles, flames, and chemicals. A significant effort would be required to 
implement this technique on the SSME, since no rocket engine application programs for this sensor are 
underway. 

3.2.15 Solid State Leak Sensors.— Gas sensitive semiconductors have been developed for some industrial 
applications and are now being evaluated for their utility in the rocket engine environment. The conductivity 
of these devices increases in the presence of combustible gases such as hydrogen, carbon monoxide, methane, 
and propane.[6] The small size of these devices makes them ideal for an array of sensors which monitors 
specific points on the SSME for hydrogen leaks. 

Gas sensitive semiconductor sensors are n-type bulk devices mainly composed of metal oxides such as 
sintered tin dioxide (SnC> 2 ). When the sensor is heated in the air, oxygen is dissociatively absorbed on the 
device surface, having a negative charge caused by the electron transfer from donor levels in the surface region. 
Consequently, an electron depletion region develops from the surface to the bulk substrate; this region is 
positively charged to balance the surface negative charge of the oxygen. This process forms potential barriers 
against bulk conduction electrons, and the sensor has very high resistance. When combustible gas is supplied 
to the sensor, it is absorbed by the surface and reacts with the absorbed oxygen, effectively decreasing the 
potential barriers in the device and reducing resistance. Sensor resistance decreases exponentially with gas 
concentration. Figure 3.39 depicts a typical gas sensitive semiconductor sensor and its associated sensitivity 
characteristics. 
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The signal processing requirements for this sensor are very simple. Following analog circuitry that 
measures the resistance change of the sensor, a thresholding would be used to identify gas concentrations 
above a preselected level. Readings from several sensors would be used for corroboration and location of the 
leak. Solid state leak sensors are still in their development stage for rocket engine applications. Major issues 
that need resolving are their sensitivity to temperature and other environmental conditions. Furthermore, 
existing sensors are not yet capable of detecting a single specific gas and therefore, may be confused by 
extraneous gases unrelated to the desired leaking gas type. However, it is expected that these sensors could be 
available for ground test within 5 years. 

3.12.6 Plume Electrical Diagnostics. —The SSME exhaust plume has the potential of supporting various 
sensing technologies that exploit the presence of particles and species in the plume to provide specific 
diagnostic information about engine and component erosion and wear. Plume electrical diagnostics is a 
nonintrusive sensing technology which classifies engine events based upon electrostatic gas path signatures. 

Distresses in the SSME, such as turbine blade rubs or combustor bums, will produce particles of debris 
which carry electrostatic charge. Plume electrical diagnostics involves monitoring electrostatic probes in the 
plume to detect these charged particles. As the exhaust gases will have a nominal level of electrostatic charge, a 
background signal will exist. The signal processing and diagnostics must detect changes above this 
background signal, extract features from the probe signals, and employ classification techniques to determine 
if a fault has occurred. 

Electrostatic techniques have been successfully applied to the analysis of gas turbine engines by P&W, 
HS, Sikorsky, and UTRC Divisions of UTC. It was found that the electrostatic pulses had characteristic 
signatures that could be correlated with known engine events. Turbine blade erosion, for example, manifested 
itself in the electrostatic analysis as a negatively charged pocket in the gas path which, as it moved past the 
electrostatic sensor, produced a 25 ms wide time-varying voltage signature. 

Measurement of plume electrical signatures is relatively simple to implement, but it is rather complex to 
understand the significance of the resulting signal. This technique will need to be tested extensively in the 
SSME environment in order to determine optimal probe placement, and to build a database of electrostatic 
signatures which correlate with engine events. The computational requirements for inclusion of plume 
electrical diagnostics into an HMS are expected to be significant, since they generally include various pattern 
recognition techniques. Some preliminary work on implementing this technique for rocket engines has been 
done by NASA-MSFC. 

3.2.2 . 7 Fiber Optic Deflectometer . — Fiber optic deflectometers use light reflections to measure outer race 
deflections due to roller element passage in order to quantify bearing and race conditions in the SSME 
turbopumps. This is an intrusive device that requires a through-hole to the bearing outer race. It has been 
included for possible incorporation into the HMS, as it may be configured to fit in the same mounting fixture 
that is used by the acoustic emission sensor on the SSME-ATD turbopumps. 

UTRC has developed deflectometers for various industrial applications to measure surface 
displacements. Rocketdyne has been evaluating a probe manufactured by MTI for potential SSME 
application[7]. The probe consists of a laser light source, fiber optic cables in close proximity to the bearing 
outer race, and a photodetector to receive the light reflected by the outer race. Light from the light source is 
transmitted through the optical fibers, is reflected off of the bearing race, and is transmitted through the 
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receiving optical fibers to the photodetector (see Figure 3.40a). The intensity measured is a function of the gap 
width between the cable tip and the bearing outer race. 

Race deflections from normal bearing operation produce a distinctive half-sinusoidal signature at the 
ball pass frequency. Changes in the smooth sinusoid signal are indicative of potential faults in the bearing (see 
Figure 3.40b). Fault detection algorithms will require frequency analysis to extract bearing component 
fundamental frequencies and their harmonics. The computational requirements for this processing will be 
high due to the required spectral estimation techniques. Additionally, pattern recognition may also be 
appropriate for detection of signatures which correlate with unique subtle faults within the bearing. 

3.2.18 Laser Vibration Sensor. -A Fiber Optic Laser Vibration Sensor (FOLVS) has been developed by 
UTRC for vibration measurements in locations where the size and mass of piezoelectric accelerometers 
preclude their use. The FOLVS technique utilizes the principles of a common path interferometer using a 
coherence multiplexing technique to measure vibration in either contacting or noncontacting modes (see 
Figure 3.41). As part of the SSME-ATD program, the FOLVS technique is being adapted to detect bearing 

faults in the HPFTP[4], 

Using this technique, a fiber optic cable wrapped around the outer race of the bearing directly measures 
vibration. A fiber optic beam splitter splits the light beam from a solid state short coherence length laser dio e 
into reference and sensor beams. The reference beam passes through a phase modulator, which changes its 
frequency by a known amount, and then through a fiber optic coil which delays the reference beam by a fixed 
amount. The optical beam coupler recombines the reference and sensor beams such that their difference 
exceeds the coherence length, so that interference will not occur. 

This combined beam travels along a single optical fiber allowing external influences to affect both the 
reference and sensor signals. Part of the light is reflected at the partial reflector and returns along the 
transmitting fiber to form a reference beam, while the rest travels the full length of the optical fiber which is 
wrapped around the bearing outer race. This light is totally reflected and returned to the partial reflector. The 
round trip length from the partial reflector to the total reflector is equal to the path length difference 
introduced by the optical delay in the reference beam. Hence, reflected beam is now coherent with the portion 
from the partial reflector; changes in fiber length or strain in bearing outer race cause interference. 

The interference signal is an FM signal with frequency of the phase modulator and sidebands indicative 
of the dynamic changes in fiber length. The demodulated signal, proportional to the fiber length change, is a 
measure of strain and vibration in the bearing race. The device has a uniform frequency response from DC to 
tens of megahertz, and is absolutely calibrated to the wavelength of the laser source. 

The computational requirements for bearing fault detection algorithms using the laser vibration sensor 
will be high, and will potentially require both spectral estimation and pattern recognition techniques. 
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SECTION 4.0 

FRAMEWORK FOR THE HEALTH MANAGEMENT SYSTEM 

The major purpose of formulating an architecture for the HMS was to demonstrate the 
interrelationships of its various functions and to provide an assessment of the hardware and software 
complexity required to implement the focused HMS. The formulation of the HMS framework architecture 
began with the establishment of the system requirements. The main requirements were: the real-time, 
simultaneous operation of various fault detection algorithms; the use of existing SSME instrumentation; the 
use of near-term technology hardware; the incorporation of nonintrusive, near-term advanced technology 
sensors; the phased implementation of the HMS on the SSME teststand; and a clear migration path from a 
Ground Test HMS to a Flight HMS. The initial step was to outline a functional architecture which served to 
identify the major system tasks, identify the interrelationships of these tasks, and to show the information flow 
between the tasks. 

Once the functional architecture was formulated, an iterative design methodology was used to develop 
the hardware architecture that could support the functional requirements. The emphasis in this task was to 
identify and demonstrate an overall approach to the hardware design, and not just to determine how many 
processors, buses, or other hardware interfaces would be required to implement the HMS. The effort 
described in this report is the first pass through a detailed design study that would be performed prior to 
implementation of an HMS. The intent is to highlight the major issues that need to be addressed rather than 
provide final design answers for those issues. To aid in this design methodology, UTRC demonstrated a 
computer architecture simulation tool, originally developed to assist the designers of VHSIC architectures, to 
evaluate the various hardware/software configurations for the SSME HMS. 

4.1 HMS Functional Architecture 

The requirements for a ground test version of the HMS were used as the basis for the design of the HMS 
functional architecture. The Flight HMS will incorporate a suitable subset of the ground test functions. The 
major functions of the ground test system are shown in Figure 4. 1. The system task manager function oversees 
the entire operation of the HMS: it provides the user I/O, system resource management, and task scheduling 
based on the current HMS configuration and status. The five major tasks supervised by the task manager are 
engine health monitoring, test data logging, off-line data analysis, database management, and system 
communications. The engine health monitoring task contains all of the functions for fault detection and 
decision making that must run in real time to provide engine shutdown capability, and is the most critical to 
the HMS framework. Consequently, the health monitoring task was studied in greater detail than the other 
major HMS tasks. It will be shown in the hardware architecture section (Section 4.2) that this task requires 
dedicated hardware to operate in real time. Engine test data logging is another time critical task, in that it 
must provide real-time storage of all desired sensor data during a test. Its purpose is to provide local data to 
the HMS in the proper format for use in off-line data analysis and algorithm development. 

The remaining three major HMS tasks (off-line data analysis, database management, and system 
communications) are not time critical, but provide essential “housekeeping” capabilities to the HMS. It is 
anticipated that these functions will be provided by commercial off-the-shelf hardware that is part of many 
current scientific workstations. The off-line data analysis task will provide the ability to analyze data from 
SSME ground tests, verify existing algorithms, and develop new fault detection techniques which will be 
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essential in maintaining and updating the monitoring coverage of the system. The database manager will 
provide the utilities to allow all HMS system data to be organized and maintained in an orderly fashion. The 
database will also contain all of the parameters, models, and thresholds that will be downloaded to the 
real-time HMS health monitoring functions. The communication task will allow the operator to transmit and 
receive information, such as the HMS data, and provide remote access to the system. 

4.1.1 Health Monitoring Task .— The purpose of this task is to provide the capability necessary to acquire 
sensor information, run fault detection algorithms, and provide a real-time assessment of the engine health. 
When an engine fault is detected, the system will transmit an engine shutdown signal to the controller through 
the CADS system. This function can be enabled or disabled as desired. 

The engine health monitoring task implements a hierarchical process in which a decision to shut down 
the engine is reached after several levels of information processing. In this hierarchy, the bottom levels process 
the sensor signals, while the middle levels determine the nominal or off-nominal operation of various engine 
parameters. The middle levels also combine the outputs of multiple fault detection algorithms to assess health 
of specific engine components. The top level of hierarchy then combines the health assessments of the various 
engine components to determine the overall engine health, and outputs a yes/no decision to shut down the 
engine. The functional architecture for this task is presented in Figure 4.2. 

The principle inputs to the health monitoring task are the sensors depicted in Figure 4.2, which include 
the existing CADS and facility sensors along with the near-term technology sensors selected for inclusion in 
the HMS. Separate interfaces will be required for each of the sensor information sources. Information from 
existing CADS sensors will be provided by the controller. The facility sensor information will be obtained 
through an interface to the existing facility recording system. 

The first step in the health monitoring task is the sensor processing function. The sensor processing 
function conditions any analog sensor data input signals, performs the analog-to-digital conversion, and 
scales the sensor data. The processing function then verifies the integrity of all the data before it is used by the 
fault detection algorithms. Sensor channels that are determined to be faulty are disabled. Running the fault 
detection algorithms described in detail in Section 3.1.3 for the CADS sensors, and those qualitatively 
discussed in Section 3.2 for near-term technology sensors is the final step in the processing function. 

The second step in the health monitoring task integrates the outputs of multiple fault detection 
algorithms to cross-check and confirm that a fault has occurred within an LRU. This process is contained 
within the component status modules shown in Figure 4.2. Each component status module would be 
responsible for determining the health of a specific LRU, or group of LRUs, by using information supplied by 
various fault detection algorithms. For example, the health of the HPFTP would be assessed based on the 
information provided by the clustering algorithm operating on a set of sensors which focused on the HPFTP, 
along with the output from the ARMA algorithms operating on single sensors related to the HPFTP. 
Additionally, the HPFTP status module would check the information provided by the acoustic emission 
bearing diagnostics algorithms in conjunction with the plume spectroscopy algorithms to confirm the 
existence of a bearing degradation fault such as a cage deterioration. Further information about the HPFTP 
hot gas power transfer would be obtained from the optical pyrometer algorithms which would indicate a 
turbine blade hot spot. In this manner, many sources of information are used to compile a scenario of the 
potential fault. The cross-checking of multiple information sources reduces the potential for a false alarm 
caused by the spurious output of a single fault detection algorithm. Furthermore, the high degree of 
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interrelatedness in the operation of the LRUs enhances the ability of the component status module to confirm 
the fault in a specific LRU by using information from the operation of other LRUs. The correlation of the 
multiple information sources is one of the steps required for identification, rather than just detection, of the 
fault. Thus, the ability to detect and identify a fault could be included in the functions of the component status 
module. 

The next logical step beyond the component status modules is the engine status module shown in 
Figure 4.2. The function of this step in the hierarchical process is to make a decision as to whether the engine 
should be shut down or continue to operate. Just as the component status modules are responsible for 
individual components, the engine status module is responsible for utilizing several sources of information to 
assess the overall health of the engine. The primary information would come from the output of the component 
status modules as a confirmation that a fault existed within the engine and possibly an identification and 
localization of the fault. The engine status module may also use prior knowledge about the performance or 
useful life of a specific component in assessing whether it should act on a fault indication. For example, 
knowing that the main injector of the engine under operation has already achieved 95% of its predicted useful 
life and would thus have a higher probability of failure, the engine status module would take this into account 
when acting on the indication of a potential main injector related problem. At this decision making level, 
knowledge about prior performance degradation that has been confirmed to exist in the engine can be utilized 
to make the shutdown decision process more or less sensitive to the indication of a particular fault. 

The component status and engine status modules are implemented using an expert systems approach, 
with a set of rules which correlate a certain set of circumstances with a certain conclusion or action. For 
example, given certain outputs from the cluster analysis, the optical pyrometer, and the plume spectrometer, a 
turbine blade fault may be concluded to exist. Next, given that a turbine blade fault does exist, given that the 
fault appears to be growing rapidly, and given that the turbine blades in the turbopump on the engine under 
test are made from a new material that erodes quickly, the action to shut the engine down will be taken. The 
number of such rules will be kept small so as to enable the real-time implementation of this strategy. The rules 
will be developed from a combination of studies of teststand data and the experience of SSME test and 
operation personnel. Flexibility in the expert system modules required will be so that improvements in the 
decision making process can be incorporated as the understanding of the fault detection and identification 
processess increases. The development of these expert systems will be included as a technology development 
program in the HMS implementation. 

4.2 HMS Breadboard Hardware Architecture 

The HMS hardware architecture was developed through a preliminary design process such that it 
supported the system functional requirements while conforming to the constraints imposed by the overall 
program goals. Both the Ground Test HMS and the Flight HMS were studied to identify the major issues that 
would need to be addressed in a detailed design to be performed as part of the implementation program. 
Although these systems operate in different environments, they both evolve from the same functional 
requirements. The flight system evolves from the ground test system through the incorporation of the unique 
constraints and requirements presented by the flight regime. Consequently, a smooth transition from ground 
operation to in-flight health monitoring can be achieved. The goal of this effort was to demonstrate a design 
methodology and present a strawman hardware architecture design, and in doing so, address issues such as 
the real-time implementation of diagnostic algorithms, the maximum limits of HMS functionality, and the 
identification of concerns unique to a Flight HMS. The final system design would be based on a more 


73 



comprehensive study of each of the critical issues and a rigorous cost versus benefits analysis of the flexible 
Health Management System. 

The preliminary design methodology used in this study includes the following major steps: 

1. Definition of functional requirements; 

2. Generation of throughput estimates; 

3. Generation of HMS Hardware block diagrams including the System Block Diagram (SBD) 
and the Detailed Block Diagram (DBD) 

4. Parametric studies on weight, and power 

5. Assessment of reliability 

The purpose of this multistep effort was to demonstrate the design process rather than produce a 
detailed design. The results of each of the steps listed above are presented in the following sections. The 
program to implement the HMS will begin with an effort to re-examine each of these steps to refine the results. 

4.2.1 Hardware Functionality. —The primary drivers of the HMS hardware design are the number and 
complexity of the functions that the HMS must support. The engine health assessment decision methodology 
presented in the discussion of the functional architecture is sufficiently robust and flexible to incorporate 
virtually any number of sensors and algorithms operating in parallel. The sensors and algorithms discussed in 
this report range from those studied in detail as part of the Phase I program effort, to those whose specific 
functional requirements will not be known until further development on the sensors and/or algorithms is 
completed. One goal of the preliminary hardware design process is to determine the extent to which both the 
known and the unknown functions can be incorporated with state-of-the-art hardware, within the general 
program guidelines. 

Another key goal to be achieved in the design of the HMS hardware is the flexibility for the system to 
evolve as the results from technology development programs become available. The intent is to design a 
system which contains the functions that have been sufficiently proven for immediate implementation, and to 
provide the flexibility to incorporate additional functions as they become proven enhancements. 

4. 2. 1. 1 Ground HMS Hardware Functionality. — A comprehensive set of functions for health management 
using current and future sensors was selected as the basis for the HMS hardware design. The ARMA, 
clustering, and RESID algorithms demonstrated on the CADS sensor data form the proven core functions of 
the HMS. Enhanced system performance will be achieved through the addition of functional modules (groups 
of algorithms directed at a single sensor or sensor type). A simple and logical addition of HMS functionality is 
the ability to implement the proven CADS algorithms on similar low frequency facility data measurements. 
Provisions have been made in the hardware design to incorporate the advanced spectral analysis techniques 
required for processing the raw, unfiltered data from selected CADS sensors, Facility sensors, and 
accelerometers. Furthermore, the HMS hardware design provides the processing capability to incorporate a 
selected set of advanced technology sensors. The advanced technology sensors include plume spectroscopy, 
acoustic emission bearing diagnostics, optical deflectometer, and the optical pyrometer. Finally, the 
processing required to implement the rule-based hierarchical decision making process has been included. It 
will be shown that, through appropriate design considerations, each of these enhancements to the system can 
be added at the appropriate time without detrimental effects on HMS performance. 
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4.11.2 Flight HMS Hardware Functionality. —There is no aspect of the HMS hardware design that would 
initially preclude any of the functions from the Ground Test HMS to be used in the flight system. Practical 
considerations, such as the ability to fly the advanced technology sensors and the impact of the extra weight 
penalties and power requirements of the HMS, will certainly limit the number of the HMS functions which are 
transferred from the Ground Test HMS to the Flight HMS. Additionally, all of the Ground Test HMS 
functions associated with the facility sensors would be eliminated, since the facility sensors are not available 
for use in flight. 


4.2.2 HMS Hardware Requirements .— There are several general design requirements for both the Ground 
Test HMS and the Flight HMS: flexibility, cost, and a clear migration path from ground test to a flight system. 
As previously discussed, the hardware must be flexible in order to add new sensor interfaces, functionality, 
and/or computational capability. This is especially important for technology programs that have changing 
requirements. Also, the costs to procure, operate, and maintain the hardware must be minimized. Reliability 
rates for the HMS hardware, defined here as the Missed Detection of Fault (MDF) rate and the False Alarm 
(FA) rate, should not significantly impact the total reliability of the HMS. Finally, the Ground Test HMS must 
have the capability to smoothly evolve into the Flight HMS by maintaining as much commonality as is 
practical between the two systems. 


An additional requirement which results from the need for flexibility, high performance, and low cost is 
that of a modular architecture approach for packaging the hardware. In a modular architecture, the increasing 
processor and I/O requirements can be easily accommodated through the simple of addition of circuit cards 
called Line Replaceable Modules (LRM). A further requirement to use off-the-shelf modular components 
provides additional cost and maintainability benefits: off-the-shelf commercial modules are less expensive 
than custom circuit boards and are readily available if the need for replacement arises. Some development 
cost savings can also be realized with commercially available modules since design costs have been amortized 
by the LRM manufacturer across many LRM sales. 

In addition to these general system requirements, there are further requirements which are specific to 
either the ground system or the flight system. This issue is briefly addressed in the next two sections. 

4.2.2. 1 Unique Requirements for the Ground Test HMS. —The primary goal of the Ground Test HMS is to 
prevent engine failures, and thereby, reduce operational and maintenance costs. This is accomplished by 
shutting down an engine that has an impending failure. The size, weight, and power requirements for the 
Ground Test System, as opposed to those for the Flight System, are not major concerns since the Ground Test 
HMS is intended for operation in the control room. 

4.2.2.2 Unique Requirements for the Flight HMS. — The Flight System has the same general requirements 
as the Ground Test System, because it will evolve from the Ground Test HMS. Operationally, the Flight HMS 
differs from the Ground HMS in that the primary goal of the Flight HMS is to prevent loss of life, and hence, 
unique issues must be addressed. The MDF and FA rates must be lower for a Flight System. False alarms are 
critical because of the extremely limited engine-out capability of the shuttle and the potential safety risk to the 
crew if takeoffs are aborted due to engine shutdown. The Flight HMS hardware must be flight worthy in a very 
harsh environment. This includes immunity to severe shock, vibration, and temperature. Reliability (Mean 
Time Between Failure [MTBF] and Pre-Liftoff Abort Rate) is of greater concern for the Flight System, as are 
the size, weight, and power requirements. 
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4.2.3 Throughput Analysis. — Throughput analysis assesses the processing and I/O requirements for the 
selected functions. It provides the initial insight into the complexity of the hardware required to support the 
functional architecture. On the basis of throughput alone, there is no distinction required between the Flight 
HMS and the Ground Test HMS, since the computation load for a specific algorithm is independent of the 
application location. The number of functions, and hence the total system throughput, for the Flight HMS will 
differ from the Ground Test HMS simply because the flight system will likely support fewer functions. 

For the purposes of the throughput study, the functional architecture for the real-time health monitoring 
function was subdivided into three parts: 

1. Reading the signal, 

2. Implementing the fault detection algorithms 

3. Performing the expert system engine shutdown decision. 

The throughput for the first part is the amount of processing required to read the signal and validate it. 
This includes the operations to manipulate the A/D conversion and the multiplexers, perform range checking, 
filter, and convert the signal to engineering units. If the signal conversion has failed hard (many times in a row), 
the signal is declared to be failed, and a flag is set. The higher level processing must reconfigure around that 
fault. 


The second part of the throughput calculation considers the single processing required for the 
implementation of the fault detection algorithms. Algorithms for the sensors discussed in Section 3.1.3 were 
considered in this study. As a general rule, twice the number of facility sensors as CADS sensors were 
assumed for the processing algorithms that operate on the low and high frequency facility data. The Fast 
Fourier Transform (FFT) was assumed to be the spectral estimation technique used for the raw facility and 
raw CADS sensor processing, the advanced vibration processing, the acoustic emission processing, and the 
deflectometer processing. All FFTs were assumed to have 1024 points. 

The third part of the calculation quantifies the throughput required to perform the hierarchical decision 
making. This includes the processing for algorithms to combine outputs of the various fault detection 
algorithms as well as the computations for the rule based expert systems which will make the fault/no-fault 
decisions. 

The results of the throughput study are summarized in Ikble 4.1 which provides the results in units of 
million instructions per second (MIPS). Several conclusions can be drawn from Table 4.1: 

1. The CADS Serial Link Processing (ARMA, clustering, and RESID) is a very small part of 
the total throughput required. 

2. An extensive amount of processing is required for the advanced vibration monitoring and 
the unfiltered raw CADS and facility sensors. This is due to the uncertainty in the 
algorithms that will be required for this processing and the likely reliance on 
computationally intensive FFT processing to provide spectral information. 

3. Processing requirements for the advanced technology sensors are generally very small. The 
exception is acoustic emission bearing diagnostics, because the maturity of the technique to 
date still requires high frequency digital signal processing. This computational load is 
expected to be dramatically reduced in the near future as analog signal processing circuits 
are developed. 
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Table 4.1 THROUGHPUT REQUIREMENTS FOR 
HMS FUNCTIONS 


Function 

I/O 

FFT 

Algs. 

TOt8l 

CADS Serial Link 
Data Procaasing 

0.1 

0.0 

13.8 

13.9 

Low Frequency Facility 

0.2 

0.0 

27.6 

27.8 

Raw CADS Data Proc. 

0.6 

23.4 

8.5 

32.5 

Raw Facility Data Proc. 

1.1 

46.8 

44.5 

92.4 

Advanced Vibration 

0.1 

10.2 

28.1 

38.4 

Adv. Tech. Sensors: 

Plume Spectroscopy 

0.1 

0.1 


0.2 

Acoustic Emissions 

6.1 

9.9 

3.7 

1 9.7 

Optical Deflectometer 

0.9 

5.5 

1.1 

7.5 

Optical Pyrometer 

0.1 

• 

0.1 

0.2 

Engine, Component 
Status Modules 

0.1 

• 

0.1 

0.2 

Total 

9.4 

95.9 

127.5 

232.8 


Not#: The above data Is based on the number of Instructions that an 

Intel 80960 C A processor would take to the given function. Based 
on an analysis of the Intel data sheets, the processor was 
conservatively assumed to operate at an effective throughput of 
12 Million Instructions Per Second (MIPS). Further, It Is 
assumed that all software Is coded In Ada. Current Ada 
compilers generate code that takes estimated two to five 
times longer than manually generated assembly code to execute. 
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4. The raw facility data processing represents nearly half the total system processing load, and 
hence, it is logical to separate this function and place it on a single subsystem. In this 
manner, calculations are not fragmented and the migration to a Flight HMS is simplified 
since the facility sensor processing will not be required in flight. 

5. The FFT processing is a very large portion of the computations and is an excellent 
candidate for implementation by special purpose processors such as a UTMC (United 
Technology Microelectronics Center) chip that can perform a 1024 point FFT calculation in 
700 microseconds. 

6. 200 MIPS is an extremely large throughput requirement for an embedded system. (Typical 
gas turbine fuel controls are on the order of 1 to 10 MIPS.) The processing requirements for 
the implemented HMS will be significantly less, since it is not likely that all of the functions 
will be incorporated into the system. 

4.2.4 HMS Hardware Block Diagrams . — Discussions of the hardware block diagrams for the Ground Test 
HMS and the Flight HMS are presented in this section. A subset of the Ground Test HMS will serve as a 
prototype for the Flight HMS. 

4.2.4. 1 Ground Test HMS. —The HMS must be integrated into the SSME environment with minimum 
impact. The interfaces should be as simple as possible in order to minimize the design errors associated with 
integration and also teststand downtime. Above all, HMS reliability is critical: it is essential that the 
probability of fault propagation from the HMS to the SSME controller is extremely low. 

Figure 4.3 represents a concept for integrating the Ground Test HMS with the SSME. The existing 
Ground Test System is composed of the SSME, CADS and Facility Sensors, the Block II SSME Controller, 
actuators, and the teststand facility itself. The CADS sensor suite is dual redundant and provides signals to 
the dual channel controller. The controller, in turn, transmits commands that control the fuel and oxidizer 
valves. The controller also has a serial link (CADS Serial Link) which transmits 128 parameters (conditioned 
sensor values and status words) every 40 ms. 

New HMS equipment will consist of a Ground Test HM Subsystem, Facility Data HM subsystem, and 
Advanced Technology Sensors. The Ground Test Health Monitoring Subsystem will be a rack of cards that will 
serve as the prototype for the Flight HMS electronics. The Facility Data HM Subsystem represents the 
electronics dedicated to collecting data unique to the facility, and will only be used for ground test. Based upon 
the throughput study previously described, this partitioning of the Ground Test HMS is logical because the 
computational loads are approximately equal, and the capability to migrate to a flight system is maintained. 

The Advanced Technology Sensors block represents near-term sensors which provide additional 
diagnostic information to the HMS. These include a plume spectroscopy system, acoustic emission sensors, 
optical pyrometers, and fiber optic deflectometers. Each advanced technology sensor will require a separate 
interface to the Ground Test HM Subsystem. The final interface between the existing test system and the 
Ground Test HM Subsystem will be receiving raw CADS sensor data. It is essential that the HMS raw CADS 
sensor signals without adversely impacting the controller reliability. Rocketdyne suggested in the final report 
for SAFD-Phase III[1], that no electronic components be placed in the path of the CADS serial link and that 
the signal be tapped using a high impedance transformer. The same design philosophy can be extended to 
tapping into the controller-sensor cable. Only those sensors which are not critical to engine control are 
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candidates for this approach. Careful study and a trade off analysis is required to assess reliability costs versus 
benefits. 

4.2.4.1.1 Ground Test HMS Block Diagram. — A detailed hardware block diagram for the Ground Test 
HM Subsystem is shown in Figure 4.4. The system is composed of industry standard VME cards and buses. In 
order to keep cost of development low, a limited number of cards will be used in multiple applications within 
the HMS. 

Sensor measurements are processed by a group of cards referred to as a Data Processing Functional 
Group (DPFG). The Ground Test HM subsystem consists of DPFGs to process data from the CADS Serial 
Link, Raw CADS Sensors, Vibration Sensors, and the Advanced Technology Sensors. One of the CPU cards 
shown on the right side of the diagram is used for the Component Status Modules and the Engine Status 
Module. The Mass Storage System, depicted as a single card and controlled by the second CPU card shown in 
the upper right half of the figure, collects the data generated by the DPFGs in real time. There are two serial 
links in the system: one is used to collect data from the Facility Data Processing Subsystem, while the other 
card, the High Speed Data Bus (HSDB), is used to transmit data to the Facility. Finally, an option has been 
included for a redundancy management card. 

The serial links to the Facility, and the Facility Data Processing Subsystem are the only parts of the 
Ground Test HM Subsystem that would not migrate into the flight system. Hence, they could be purchased 
off-the-shelf without being concerned about flight system issues. The result is that these cards will be 
inexpensive to procure and integrate into the system. The card selected for transmitting data to the Facility, the 
HSDB card, utilizes a fiber optic based protocol and yields very high data transfer rates. There are many other 
viable options which include 1553, conventional RS-232, or the existing CADS Serial Link protocol. The 
choice would most likely be made by consulting with the facility test staff to minimize cost of retrofitting the 
facility. 

To save on non-recurring hardware and software development costs, the Facility Data Processing 
Subsystem, Figure 4.5, uses the same card types as the Ground Test HM Subsystem. 

4.2.4.1.1.1 Data Processing Functional Groups (DPFGs).— The DPFGs, shown in Figure 4.6, tire 
configured using a minimum number of card types. A DPFG may consist of one, two, or three cards; the most 
common cards are the CPU and DSP cards. 

There are two very important DPFG attributes. First, a private bus between the CPU card and the 
interface card allows the designer to avoid the data transfer bottlenecks common in systems with only one data 
bus. Second, the DPFG forms a type of “Fault Containment Region (FCR).” In a system, faults do not 
propagate from one region to another. In this case, the DPFG forms a containment region that deters fault 
propagation out of the DPFG. The isolation is not 100%, as faults can propagate over power supply lines and 
the DPFG’s interface to the HMS, but many faults within a DPFG (i.e. DSP failure, private bus failure, etc.) 
will not propagate. Thus, the distributed architecture of the HMS Subsystem increases the system 
dispatchability (in comparison to a conventional, single bus system) despite localized failures. 

4.2.4. 1.1. 2 Interface. —Expanded views of the Ground Test HM Subsystem (Figure 4.7) depict the various 
interface boards and VME cards required. The interface boards for the DPFGs can be procured from 
commercial vendors. Note, however, that sometimes commercial products are not applicable to aerospace 
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FIG 4.5 FACILITY DATA PROCESSING SUBSYSTEM 
DETAILED BLOCK DIAGRAM 
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embedded systems: real world problems include data latencies, lack of verified interface software, and 
inability to adequately detect cable opens and shorts. The alternatives are to modify existing commercially 
available boards or to custom design boards. Careful selection of custom designed cards and commercially 
available cards can help to optimize the system configuration. (Note that optimum is defined here as the 
trade-offs between the risks involved in migration to a flight system and the development costs). 

United Technologies Hamilton Standard, for example, has developed VME interface boards such as the 
frequency board, the pressure/temperature board, LVDT board, and analog input boards under the National 
Aero-Space Plane (NASP) Program. In some cases, notably the LVDT card, there is no commercially 
available card that meets the requirements of the sensor. Another reason to design a card is to utilize 
components with the power, weight, reliability and performance that a flight system may require. The use of 
similar components minimizes risk in developing the Flight System. 

4. 2.4. 1.1. 3 Processor.— The processor card is the most important card in the system because of its 
widespread application through the system. The processor selected for the HMS is the Intel 80960CA, which is 
synchronized using a master 10 ms interrupt. There is a serial link for downloading programs and uploading 
data. The CPU would contain a small boot program (approximately 2 K) that would provide the instructions 
for reading and writing the program/data, as well as initiating the start of the program. Most importantly, the 
card will be capable of interfacing both to the main VME bus and a local/private bus. Depending on cost and 
criticality, this card might have Built-in-Test (BIT) hardware such as Loss-of-Clock detectors, memory parity 
check circuitry, and watchdog timers that force a system reset if the card fails. 

4.2.4. 1.1. 4 Mass Storage System. -The Mass Storage System will consist of a bulk RAM card (Random 
Access Memory, a form of volatile memory). At a user defined time during the ground test, or at the 
completion of the ground test, the HMS will write the data from this card to the hard disk. Typically, the data 
cannot be written in real time, as the data transfer rates on hard disks are not fast enough. 

4.2.4.1.2 Redundancy Management. —The system could include a Cross Channel Data Link and 
Redundancy Management Board. It is envisioned that most SSME testing will only require simplex (single 
channel) redundancy. However, when development of a flight system is initiated, the redundancy 
management will be incorporated with minimal impact to the existing Ground Test Subsystem. The major 
components of such a redundancy management board are currently being developed by United Technologies 
Ham ilton Standard Division under IRD program #89DA4.3.13. The algorithms that this concept utilizes have 
been proven under the NASA sponsored contract for the X-Wing aircraft. 

4.24.1.3 Software.— The software development methodology will be in accordance with DoD- 
STD-2167A, the DoD Standard for developing software, or in accordance with a NASA equivalent standard. 
The methodology is shown in Figure 4.8. The first step is to specify the system (both hardware and software) in 
a document known as System Software Specification (SSS). Next, the functional requirements for the software 
are specified in a Software Requirements Specification (SRS). A top level design is generated and is followed 
by the corresponding detailed design. Both of these steps are documented in the Software Design Document 
(SDD). Only after these steps are performed, can the actual coding begin. The code is read and then module 
checked. The software is tested on the hardware in a step known as CSC Integration Testing. Finally, the 
software is formally tested against the SRS in a verification step known as CSCI (Computer Software 
Configuration Item) Testing. The result is a well disciplined software development process that performs to 
cost and schedule requirements with a minimum of errors. 
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Fig. 4.8 DOD-STD-2167A SOFTWARE DEVELOPMENT PROCESS 
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4.2.4.1.4 Physical Packaging. —The triple height VME boards for the Ground Test Subsystem fit into one 
VME rack, while those for the Facility Data Processing Subsystem fit into another. Both VME racks will be 
housed in a standard 19 inch cabinet with a service door on one side and electrical connectors on the other. The 
cabinet is six feet tall, and the power supplies will be mounted in the bottom. Such a system is easy to configure 
and maintain. 

If cable runs are extremely long, with potentially heavy electro-magnetic interference (EMI), the system 
could be outfitted with fiber optic interfaces that convert electrical signals to light-based signals. Note that 
long runs are typical of rocket test installations. 

4.2.4. 1.5 Support Equipment. — The principle piece of support equipment for the HMS is the 
Development Test System (DTS). This is a VME based system housed in the same type of cabinet as the 
application hardware. The VME cards in the DTS simulate rocket engine sensor signals. The DTS has a desk 
top computer that allows the operator to download program and to upload data via a serial link to the 
processor cards, and a hard disk for program and data storage. Current versions of the DTS use RS-422 or 
MIL-STD-1553B asynchronous data formats for the serial link. The operator can command the DTS to 
simulate engines in real time automatically using data from a prepared database. Additionaly, the DTS can be 
used to verify the integrity of the hardware as part of acceptance testing. 

The DTS is linked with a VAX based host computer, via either a RS-232 data link or a conventional 
modem, for remote operation. The HMS software will be developed on the host computer. In addition, the 
VAX communicates with the workstation where the fault detection algorithms are developed, so that 
algorithm designers can generate test cases, evaluate their results, and verify real-time system 
implementation. The host computer would also be capable of converting the data from 9-track computer 
tapes of actual rocket engine tests for real-time simulation by the DTS. 

Finally, the entire DTS, complete with signal simulation capability, can be taken to the teststand to aid in 
isolating faults during integration. Very often, only the desktop computer is required in the field. The benefit 
of this reduced system will be the cost savings. 

4.2.4.2 FLIGHT HMS.— Figure 4.9 shows the relationship of the Flight HMS to the existing SSME 
controller. This installation is dual channel both in the control and the HMS, and represents the optimal 
redundancy. In contrast, a simplex system would have an unacceptably high False Alarm rate, since HMS 
faults could not be distinguished from engine faults. Unlike the ground test system, false alarms significantly 
impact the mission success rate. A triplex system, on the other hand, is excessive and represents an 
unacceptable cost, weight, reliability, and power penalty. 

4.14.2. 1 Flight HMS Block Diagram. —The architecture of the Flight HMS is virtually identical to that of 
the ground test unit with several functional exceptions (Figure 4.10). First, there is no serial link required for 
the Facility Data Processing. Second, the Mass Storage System is not required for flight. Third, the Line 
Replaceable Modules (circuit boards) would be repackaged for the rigors of a flight environment. These 
changes to the ground test hardware are fairly minor, and thus result in a low risk, synergistic design. 

4.2.4.2.1.1 Data Processing Functional Groups.— It is desired to retain the same DPFGs in the Flight 
HMS as in the Ground Test HMS. There will be one, two or three cards in a group that will perform signal 
processing. In the case of the three card DPFG, there will be an interface card, a DSP (Digital Signal 
Processing) Card and CPU card. 
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Fig. 4.10 FLIGHT SUBSYSTEM DETAILED BLOCK DIAGRAM 
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The use of DPFGs will maintain the design concept developed for the Ground Test HMS and assure a 
smooth migration between the two systems. 

4.2.4.2.1.2 Interface . — It is desired to re-use the electronic design that was developed in the Ground Test 
HMS. The only change is that the card will be repackaged to withstand the flight environment. The use of 
common designs minimizes risk to the hardware design. Furthermore, software costs and schedule impacts 
are minimized in transition to the Flight HMS. 

4.2.4.2.1.3 Processor.— A very important commonality between the Ground Test HMS and the Flight 
HMS is the CPU card. If the same CPU card is used in both systems, substantial software costs and schedule 
savings will be realized. Furthermore, the design risks typically associated with a change in the CPU card are 
greatly reduced. 

Note that the discussion is not limited to the microprocessor on the CPU card; it can extend to 
commonality of the entire board. Manipulation of timers, interrupts, fault logic, etc. by the software would 
remain identical between the two versions of the HMS. In summary, if design changes are truly limited to 
repackaging of the card, then cost, schedule, and risk can be substantially reduced. 

4.2.4.Z2 Redundancy Management. —The same electronic design of the Redundancy Management Card 
from the Ground Test HMS will be used in the Flight HMS. 

4.2.4.13 Software. — As mentioned above, the use of common hardware designs between the Ground Test 
HMS and Flight HMS results in substantial cost savings in software development. Furthermore, schedule and 
risk are minimized. 

The software methodology will follow DoD-STD-2167A. The flight system, however, will have a more 
stringent development procedure which will include module code read, module test, and more formalized 
system level tests. By delaying these steps until the Flight HMS program, there is substantial cost savings 
realized during the ground test development. 

4.14.14 Physical Packaging. —Some standard LRMs may not be able to withstand the harsh 
environment of engine mounting and flight systems. Of particular concern are requirements for vibration, 
weight, size, power consumption, reliability, and cost. 

A typical LRM is shown in Figure 4.11. In this example, the board size is specified by SEM-E format. 
This format has been specified in the past by DoD. The most recent version of the SEM-E specification can be 
found in the Joint Avionics Working Group (JIAWG) document number J88-G2B. JIAWG is responsible for 
setting up a common avionics baseline for the Advanced Thctical fighter (ATF), the Light Helicopter Program 
(LHX) and the A-12 Navy Aircraft. Typically, a board generates 20 to 30 watts of power and weights 1.5 to 15 
pounds. The dimensions are 5.88 in. by 6.68 in. The thermal wedge lock clamps have a dual purpose: they 
provide a high thermal conductivity path to dissipate heat from the board to the rack and secure the card in the 
rack. 


This board would have components mounted on both sides of the board using surface mount technology 
(SMT). SMT offers four times the packaging density of conventional dual in-line packages. The challenge with 
SMT is to ensure that solder stresses are minimized. A common source of stress is due to the component and 
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Fig. 4.11 TYPICAL LINE REPLACEABLE MODULE 
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circuit board’s different thermal rates of expansion. The solution is to mount the circuit board on a material 
that has a similar rate of expansion as the board. A typical approach is to sandwich a copper-invar-copper 
core between the circuit boards; the copper acts as a good heat sink, while the invar controls the rate of thermal 
expansion of the boards. 

The question of what standard LRM format to be used is a trade-off issue that must be addressed prior 
to detailed design. The goal of any LRM standard is to reduce the Life Cycle Costs by eliminating the need for 
custom designs and being able to mass produce hardware. Another concern is the cost of implementing a 
function using standard LRMs. If, for example, it takes two cards to implement a function using one custom 
design card, the standard evokes a weight, size, power and reliability penalty. The end result is that the trade 
must consider the design costs, schedule, procurement and maintenance costs as well as potential size, weight, 
power and reliability impacts. 

4.2.5 Power and Weight Parametric Studies . — The weight, power consumed, and failure rate of the 
hardware described in the previous sections can be calculated. The results, given in Tkble 4.2, yield the upper 
bound for a possible Flight HMS with a very high degree of functionality. 

The parameters for the HMS with varying functionality were also computed. The results, shown in Tkble 
4.3, demonstrate that there is a trade-off between functionality and weight. If the system with full functionality 
is not feasible in terms of weight, power and reliability, then a subset of the HMS might be. The assumptions 
made in the parametric studies are summarized in Tkble 4.4. 

4.2.6 Markov Modeling Analysis of the HMS .— Markov modeling analysis was used to assess the 
reliabilities of the Ground Test HMS and the Flight HMS. The results of this analysis are discussed below. 

4.2.6.1 Ground Test Reliability. —The Ground Test HMS pre-firing abort rate, missed detection of faults 
(MDF) rate, and false alarm (FA) rate (components of the HMS reliability) are calculated in this subsection. 

4.2.6.1.1 Pre-Firing Abort Rate . —It is assumed that the only time a ground test would be halted prior to 
firing would be if there were a gross failure of the HMS, i.e. any failure that completely fails the entire single 
channel HMS. The statistically significant causes include: 

Pps = Probability of Power Supply Failure per Firing 

Pcpu — Probability of CPU Failure per Firing (Note this is the Engine/Component 

Status Module CPU) 

Phsdb = Probability of HSDB Card Failure per Firing 

Pbus = Probability of a Internal Bus Failure per Firing 

Assuming that the HMS is powered for one hour prior to firing, the probabilities are estimated to be: 
Pps = 80 failures per million firings 

Pcpu = 80 failures per million firings 

Phsdb = 80 failures per million firings 

Pbus = 40 failures per million firings 

The probability of a ground test pre-firing abort for a simplex system is the arithmetic sum of these four 
rates: 0.000280 failures per firing. If the engine failure rate is on the order of 0.01 failures per firing, then the 
HMS has a negligible contribution to the engine failure rate. 



Table 4.2 MAXIMUM WEIGHT, POWER AND 
RELIABILITY OF HMS 


Equipment 

Weight 

(lb) 

Power 

(Watts) 

Failure Per 
Million Hours 

HM LRU 

36.0 

600 

1480 

Rack 

21.5 

0 

200 

Cables 

19.9 

0 

124 

Sensors 

13.6 

0 

340 

Total for 1 
Channel 

91.0 

600 

2144 

Multiply by 2 




Total for 2 
Channels 

182.0 

1200 

4288 


Table 4.3 

RESULTS OF PARAMETRIC STUDIES OF 
OTHER SYSTEMS WITH VARYING 
FUNCTIONALITY 



Option 



FUNCTION 

1 

2 

3 

Strawman 

CADS Serial Link DP x 

X 

X 

X 

Vibration DP 


X 

X 

X 

CADS Sensor DP 


X 

X 

Adv. Tech. Sensors DP 



X 

Weight (lb) 

37.4 

66.3 

101.5 

182.0 

Power (Watts) 

336 

528 

768 

1200 

MTBF (hours) 

1030 

586 

409 

233 
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Table 4.4 ASSUMPTIONS USED IN PARAMETRIC STUDIES 


Equipment 

Weight 

(lb) 

Power 

(Watts) 

FPMH* 

Digital LRMs 

1.50 

30.0 

80 

PS LRM 

2.50 

40.0 

80 

Other LRMs 

1.50 

20.0 

40 

Rack (54 slots) 

43.00 

43.0 

200 

Cable (per sensor) 

0.32 

0.0 

2 

Sensor 

0.40 

0.0 

10 


* FPMH: Failures par Million Hours 


91 


4.2.6.1.2 MDF and FA Rates. —The results of this modeling are summarized in Table 4.5; a ground test 
HMS with algorithm coverage, Ca, of .90 will result in a order of magnitude reduction in the loss of engines. 
Similarly, the probability of false alarm per firing, Pfa, must be less than 0.06 false alarms per test. A 
discussion of the analysis follows. 

The Markov Model for the ground test system is shown in Figure 4.12. The symbols are defined as 
follows: 


Pe 

Ph 

Ph* 


Pspurius 

Ch 


Pfa 

Ca 


= Probability per Flight of Engine Failure 

= Probability per Flight of Massive HMS Failure 

= Probability, given an engine failure, that the HMS has a gross failure prior 
to engine fault detection (a one second window during which a power 
supply or Engine/Component Status Module or CPU within one channel 
fails) 

= Probability per flight that the HMS fails due to lighting, EMI or power 
transient 

= Coverage of the HMS expressed as a Probability, given a massive HMS 
Failure, that the failure is detected by either the failed channel or the 
remaining healthy channel 

= Probability per Flight of False Alarm due to the HMS 

= Coverage of algorithm, expressed as a probability, given that the engine 
failed, that the algorithm will correctly identify the engine as being failed 


For all tests it was assumed that an engine is fired for 10 minutes and that Pspurius is negligible. Based on 
the HMS having a 95 % Built-In Test (BIT) coverage, the value of Ch for a single channel system is typically 
0.95. 


Normally, coverage refers to the ability of a controller to detect and isolate a failure, and then reconfigure 
around it. In the case of a single channel HMS, there is no need to isolate the failure any further than the HMS. 
Also, having the HMS deactivate itself is an acceptable form of reconfiguration; engine tests can continue 
without the HMS even though it represents a statistical risk. This risk is small because the chance of both an 
HMS failure and an engine failure is relatively remote. 

lb assess the impact of the hardware on the MDF and FA rates, it was assumed that the algorithm was 
perfect, that is, the Ca parameter was 1.000 and Pfa was 0.0000. The resulting MDF and FA rates were, 
respectively, 6.6E-6 and 1.6E-5 events per test. Note that the critical path for MDF is the path through states 
A, C, G, K and I. The FA rate is dominated by the path through states A, C, H, and L. 

The goal of the hardware implementation was for the hardware contribution to the MDF and FA rates to 
be an insignificant (approximately 10% or less) portion of the total MDF and FA rates. However, these rates 
are not known since the Ca and Pfa parameters for the real world, imperfect algorithms are not known. But, 
they can be derived from the propulsion system failure rate which consists of the failure rate for the engine and 
the fuel storage and delivery system (tanks, pipes, shutoff valves, etc.). The MDF rate should be at least an 
order of magnitude lower than the propulsion failure rate. If one assumes that the propulsion failure rate is 
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Table 4.5 MISSED DETECTION OF FAULTS (MDF) AND 
FALSE ALARM RATE (FA) FOR GROUND TEST 
SYSTEM 


Assumption 

HMS with 
Perfect Alg. 

HMS with 
Real World 
Algorithm 

No HMS 

Coverage of 
Alg. (Ca) 

1.000 

0.900 

N/A 

Probe of False 
Alarm Rate (PFA) 

0.000 

0.060 

N/A 

Source of Failure 

Hardware 

Hardware & 

Software 

Algorithm 

N/A 

Resulting Rates 

MDF 

6.6 E*6 

0.002 

0.020 

FA 

1.6 E-5 

0.060 

0.000 



Fig. 4.12 MARKOV MODEL USED TO CALCULATE 
MDF AND FA RATES FOR GROUND TEST 
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0.02, then the resulting sum of the MDF rates for an HMS equipped engine is 0.002, an insignificant 
contribution. 

Also, the MDF rate figure can useful in determining requirements for Ca. If MDF is equal to 0.002, then 
Ca must be 0.90 based on the Markov Model of Figure 4.12. The requirement for Pfa can also be determined. 
Suppose one required that the false alarm rate to be no greater than three times the rate of propulsion failures, 
or 0.06. The corresponding Pfa for the Ground Test HMS would be 0.06 false alarms per engine test. 

4.2.6.2 Flight System Reliability. —The Flight HMS pre-liftoff abort rate, MDF rate, and FA rate 
(components of Flight System realiability) are calculated in this subsection. 

4.2.6.2.1 Pre-Liftoff Abort Rate. —The calculation of the Flight HMS pre-liftoff abort rate is similar to 
that of the ground test. The difference is due to additional the Redundancy Management Card and the dual 
channel redundancy of the flight system. Defining Prm (the probability of a redundancy management card 
failure per flight) equal to that of Pcpu, the resulting pre-liftoff abort rate is estimated to be 0.000720 failures 
per flight. It is assumed that if either channel fails grossly prior to liftoff, the flight will be aborted. Again, the 
HMS will have a negligible contribution to the overall abort rate. 

4.2.6.2.2 MDF and FA Rates.— The main difference between the Ground Test System and the Flight 
System is that the latter is dual redundant. Ph is then double that for a single channel HMS system, 
approximately 0.004 failures per million hours. Based on each channel having a 95% BIT coverage on itself and 
a coverage of 95% on the other channel’s uncovered faults, the value of Ch for a dual channel system is typically 
0.9975. 

The coverage parameter, Ch, has a somewhat different meaning for the Flight HMS. Coverage is 
formally defined as the ability to detect and isolate the fault and then reconfigure around it. The Flight HMS 
need only fail safe, as the Space Shuttle can fly without the HMS, although it represents a statistical risk. That 
risk is small because the chance of both an HMS failure and a engine failure is relatively remote. Hence, for fail 
safe operation, the ability to isolate the fault is restricted to the HMS as a whole, not any particular channel. 
Furthermore, fail safe is a very simple form of reconfiguration. 

lo assess the impact of the hardware on the total MDF and FA rates, it was assumed that the algorithm 
was perfect, that is, the Ca parameter was 1.000 and Pfa was 0.0000. The resulting MDF and FA rates are 
1.3E-5 and 1.7E-6 events per test. 

The goal of the hardware implementation was for the hardware contribution to the MDF and FA rates to 
be an insignificant (approximately 10% or less) portion of the total MDF and FA rates. Unfortunately, these 
rates are determined by a vehicle level Markov Model. In the vehicle level model, the end state probabilities 
(loss of vehicle, loss of crew, etc) are defined by NASA. The MDF and FA rates are dependent variables in this 
case and become the requirements of the HMS. Correspondingly, they will dictate the Ca and Pfa parameters. 

The Markov Model in Figure 4.13 is very complex to analyze since there are three engines and there are 
several windows for mission abort and crew escape. It is apparent that Ca and Pfa are inversely related and 
must be traded in terms of vehicle reliability goals. Nevertheless, Figure 4.13 does demonstrate the 
relationship between the HMS MDF/FA rates (including both hardware and software/algorithm reliability 
effects) and the vehicle safety goals. 
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Fig. 4.13 TYPICAL VEHICLE LEVEL MARKOV MODEL 
FOR THREE ENGINE/MANNED VEHICLE 
(Show complex relationship between customer 
defined end states and probability of missed 
detection of fault, and false alarm) 
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4.3 Optimization Tool for Hardware/Software Integration 

UTRC has a variety of tools to assist in the design and evaluation of hardware and software architectures. 
One tool, ADAS (Architectural Design and Simulation), was used to demonstrate the optimization procedure 
during the design of HMS hardware architecture. 

4.3.1 Architectural Design And Simulation (ADAS) Description.— Based on petri nets, ADAS enables a 
designer to map the data flow of an electronic system to its block diagram and simulate system execution. The 
designer can then examine the results of simulation for hot spots and bottlenecks. With this tool set, the 
designer can iterate to a balanced architecture with hardware resources allocated to achieve uniform 
utilization. 

Developed by the Center for Digital Systems Research at Research Ihangle Institute, ADAS was aimed 
at the design of large scale integrated circuits. Such circuits have outgrown the traditional breadboard, and 
more efficient design methods were essential. 

Rather than fabricating a breadboard, an ADAS user constructs the data flow diagram and may also 
construct a hardware block diagram with a graphical editor, EDIGRAF. The mapping of hardware to the data 
flow diagram may be performed by ASH, the hardware to software mapping utility. Alternatively, the mapping 
may be directly assigned or overridden by the user. This mapping is illustrated in Figure 4.14. 

In this Figure, Tasks 1 and 2 are mapped to separate CPUs and therefore can operate in parallel.. Since 
Task 3 requires results from both Thsks 1 and 2, the beginning of Ihsk 3 is postponed until those results are 
available. Because Ihsk 3 is executed on the same CPU as Ihsk 2, the Ihsk 2 results are available immediately 
upon conclusion of Ihsk 2. The Ihsk 1 results however suffer an additional delay for communication. This is 
accounted for in the DATA XFER block of the software data flow graph which is mapped to the BUS block of 
the hardware connectivity graph. Tb the left of this figure, the method of accounting for utilization is 
illustrated. Each block in the data flow graph is allocated a delay time consistent with the amount of time 
required to process its work on the hardware available to it. Each connecting line (arc) in this graph is able to 
hold tokens up to a user defined maximum number. When all input arcs to a block contain at least a threshold 
number of tokens, then the block may prime. On priming, the block consumes a user defined number of 
tokens from each separately defined input arc. After the appropriate delay for this block, the block fires and 
produces a user definable number of tokens on its output arcs. 

Once mapped, the software or data flow graph is exercised. GIPSIM determines hardware utilization 
and latency by simulation, while XPETRI provides the user with the same information by petri net analysis. A 
consistency checker CONCH helps the user validate the model and a report generator DBPRINT assists with 
documentation. Hardware description languages HELIX, and ISPS may be used and functionality of the 
nodes may be defined by ADA or C program code. 

4.3.2 ADAS Model.— During the HMS architecture definition process, a candidate architecture was 
extracted, simplified, and modeled in ADAS as an example of how the tool set could benefit the system design. 
The first iteration results, shown in Figure 4.15, demonstrate an unbalanced architecture with inadequate 
computational resources. This first iteration is unable to keep up with the required system input rate as 
illustrated by the less than 100% utilization of the input data block. With the addition of more computational 
hardware as illustrated in Figure 4.16, the architectural balance is improved and the system becomes able to 
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Fig. 4.14 ADAS TOOL UTILIZED IN HMS HARDWARE ARCHITECTURE STUDY. 

ADAS System Modeling mops sottwere date flow onto Iterdwere connectivity. 



Fig. 4.15 ADAS RESULTS SHOW LOADS FOR SIMPLE ARCHITECTURE 
Tht 43% utilization of ttw Input Oat a Mock tndlcataa tha 
almpta archltactura la CPU Kmitad. 
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Fig. 4.16 ADAS RESULTS SHOW LOADS FOR COMPLEX ARCHITECTURE 

100% utilization of tha Input data block Indicates that a trtpia procaaaor 
architecture supports tha hill workload. 
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meet the HMS requirements. It must be noted that adding more processing power is not always beneficial. In 
fact, when the system is communication limited, adding more processing hardware can actually slow the 
output because more time is wasted fighting for access to the overtaxed data buses. This scenario is analogous 
to the traffic arteries of a city at rush hour when adding more vehicles to move people results in a slowdown. In 
the case of the HMS, however, even with the tripling of processing power, the buses and links are lightly loaded. 
Such an observation presents the designer with an opportunity to explore the possibility of replacing those 
buses and links with slower, less expensive implementations. 

Other factors can override the desire to balance the architecture. For instance, the links are modeled 
after the 1553B which was designed to provide the reliability required for military aircraft. Because the design 
of the 1553B is already paid for, and because of its wide ranging compatibility, substitution of a lower 
performance bus in this case would probably increase the cost rather than decrease it. Although ADAS helps 
the designer to assess the relative performance of design alternatives, the decisions are still the purview of the 
designer. 


4.4 Summary 

An HMS strawman hardware architecture with a high degree of functionality has been presented. The 
actual hardware will be a subset of this system. Many requirements were discussed which in turn necessitated 
a modular HMS approach. A throughput study was performed along with an assessment of power, weight, and 
reliability requirements. 

The Ground Tfest HMS was envisioned as a VME based system. The notable feature of the architecture 
was the use of DPFGs. A typical DPFG consists of a private bus to transfer data from the I/F card to the DSP, 
card and finally to the CPU card. The DSP board will be based on a special purpose floating point FFT 
processor such as one manufactured by UTMC. The CPU card will be based on the Intel 80960, with the 
Motorola 68030 as an alternate choice. The software will be written in accordance with Ada per 
MIL-STD-1815 and developed per DoD-STD-2167A. The hardware will be housed in a six foot 19 inch rack 
for operation in the control room. Support equipment will include a VME based development test system for 
simulating the rocket engine and testing the HMS electronics. 

The Flight System will be a logical adaptation of the Ground Tfest HMS, and will be dual redundant both 
in control and the HMS. The differences in hardware will be the dual redundant HM channels, and the 
elimination of facility sensor hardware and Mass Storage hardware. The maximum weight of the Flight HMS, 
including sensors and cables will be 182 pounds. The maximum power consumed will be 1200 watts, and the 
minimum MTBF will be 4200 hours. The desired goal is that the Ground Tfest and Flight HMS share identical 
electronic design. The main difference between the two is that the Flight HMS will be dual redundant and 
repackaged to withstand the harsh flight environment. The benefits of this approach include the re-usability of 
the software code resulting in cost, schedule, and risk reduction. Also, the hardware design will have been 
adequately debugged prior to the Flight HMS repackaging phase, further minimizing the risk of Flight HMS 
implementation. 

Thro design techniques were demonstrated during this task. Markov modeling was used to determine 
abort. False Alarm and Missed Detection of Faults rates. The second technique, ADAS, was used to optimize 
the hardware/software configuration for the Ground Tfest and Flight HMS. ADAS is one of many tools that are 
available for design and verification of an HMS. 
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SECTION 5.0 

IMPLEMENTATION PLAN FOR THE HEALTH MANAGEMENT SYSTEM 


This implementation plan reflects the UTRC design methodology for the development of a breadboard 
HMS. A systematic approach for the definition of system requirements, hardware and software development, 
system validation and verification, and teststand integration was taken. The purpose of this implementation 
plan is to provide a vehicle from which the actual and detailed implementation program plan can be derived. 
The intent is to show a variety of cost, capability, and complexity options that can be tailored to match the 
eventual scope of the implementation program. 

TWo health management systems will be developed (one will remain at UTRC, the other at the SSME 
teststand). The modular HMS provides for a phased implementation in which the total HMS is designed as a 
baseline system which is expanded by a number of subsystem options. The baseline system provides a 
near-term, low risk implementation of the algorithms developed as part of Phase I, Tksk 2. Subsystems 
centered around a particular sensor or sensing technology, which will increase the diagnostic capabilities of 
the HMS, are presented as optional additions to the baseline system. A technology program is included to fill 
the near-term technology voids. 

The Baseline HMS consists of a Data Logging System, CADS Serial Data Link and a Health Monitoring 
Function. A general purpose workstation/minicomputer will provide the following non time-critical 
functions: user interface; system task manager; off line analysis; database; database manager; and a 
communications link. The critical real-time health monitoring and data logging functions will be 
implemented in a system to be added onto the workstation. The baseline system itself will be implemented in 
stages to allow for a smooth integration of the baseline and subsystems into the teststand environment. 

The CADS Serial Data Link and Data Logging System will be ready for teststand integration and 
operation 10 months after the start of the program. This will provide data for algorithm verification and 
support, as well as feedback regarding issues encountered during the integration phase that can impact the 
im plementation of the algorithms in the Health Management portion of the baseline system. Tfeststand 
integration and operation of the complete Baseline HMS will begin 24 months after program start. 

The verification process is a substantial and integral portion of the implementation plan. It is extremely 
important to verify that the HMS can detect and accommodate faults in a timely fashion to minimize 
component degradation and prevent the development of catastrophic situations. An ideal scenario for 
complete performance validation and verification would involve installing the HMS on an engine and 
conducting a number of full scale engine tests run intentionally with component abnormalities. Clearly, this is 
unrealistic in terms of the costs and schedules involved. Instead, a significant amount of resources and effort 
have been allocated to the development of verification tools such that hardware, software, and algorithms for 
the Baseline HMS and subsystems will be verified prior to system integration at the teststand. 

A systematic method of validation and verification will integrate component and subcomponent bench 
testing with simulations for algorithm testing. The hardware and software will be tested in both non real-time 
and real-time environments. The HMS will be tested in real-time simulation to assess the impact of computer 
cycle times, memory requirements, and data transfer rates on its ability to successfully perform safety related 
functions. 
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There are inherent limitations on the systems which will be used to determine the reliability of the HMS. 
These include issues of how well the test conditions resemble the real SSME environment; how accurate the 
engine models are; how responsive and reliable the algorithms are. These questions will be answered during 
the implementation and testing phase of the program. 

The phased implementation plan for the modular HMS is divided into seven tasks. The HMS consists of 
the Baseline HMS and Data Logging system, and six separate subsystems which can be integrated into the 
HMS. These subsystems are options to be selected individually or in combination to provide additional 
functionality and capability to the baseline HMS. Some subsystems require support from a Near Term 
Technology Development Program. 

The following paragraphs describe the work to be accomplished in these tasks, and discuss how program 
goals are to be met. The 48 month program schedule is summarized in Figure 5.1. A Work Breakdown 
Structure (WBS) (Figure 5.2) and a summary of man months and other direct costs (Thble 5.1) are included as 
supplements to this implementation plan. 

5.1 Baseline System Discussion: 

5.1.1 Task 1- Baseline System Implementation: CADS Serial Link. 

Task 1.1 Preliminary Design - The system requirements defined in HMS Phase I, Task III, will be 
reviewed to determine if they are consistent and complete. Customer requirements, program management 
requirements, HMS statement of work, and technical specification documents will be revisited. If the 
requirements are found to be insufficient, further requirements will be allocated to the HMS functions, and 
design modifications incorporated as required. A system segment specification (SSS) detailing the hardware 
and software modules will be produced. 

Task 1.2 Algorithm Verification and Support. — ARMA models, nonlinear regression, and the clustering 
detection algorithms identified and developed in HMS Phase I, Thsk 2 will be verified and refined for optimal 
performance. Nonlinear regression techniques will be used during SSME startup and shutdown phases, while 
the ARMA models and clustering detection algorithms will be used for fault detection during mainstage 
operation. As the actual numerical constants and detection thresholds used in the algorithms were only 
pre liminar y, this task will encompass algorithm constant/threshold selection and algorithm verification. 

The verification process will include running the algorithms on CADS data from a number of SSME 
nominal and failure tests to establish algorithm performance with respect to correct fault detection rates, 
detection times, and robustness. Algorithm constants, confidence intervals, and thresholds will be established 
to minimize false alarm rates and detection times. False alarm rates will be assessed and quantified. 

This task will support algorithm development during the period that the hardware is in its 
implementation phase. Once the system reaches the teststand, algorithm changes and tweaking will be 
supported by the teststand integration and operation effort. 

Task 1.3 Hardware Development Process. —Hardware will be developed to support two health 
management systems (one to remain at UTRC, the other at the teststand). A modular design will be 
implemented for the HMS, as it allows for a flexible implementation and room for growth. A general purpose 
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Task 1 Baseline System: CADS Serial Link 

YEAR 1 YEAR 2 YEAR 3 YEAR 4 


1.1 Near Term Technology Development 

1.2 Preliminary Design 

1.3 Algorithm Verification & Support 

1.4 Hardware Development Procedure 

1.5 I/O Software Development 

1.6 Algorithm Software Developm e nt 

1.7 Verification Tool Development 

1.8 Test Stand Integration 

1.9 Program Man ag em en t 



Task 2 Subsystem I: Low Frequency FRS 

2.1 Near Term Technology Development 

2.2 Preliminary Design 

2.3 Algorithm Verification & Support 

2.4 Hardware Development Procedure 

2.5 1/0 Software Development 

2.6 Algorithm Software Development 

2.7 Verification Tool Development 

2.8 Test Stand Integration 

2.9 Program Management 



Task 3 Subsystem II: Plane Spectroscopy 


3.1 Near Term Technology Development 

3.2 Preliminary Design 

3.3 Algorithm Verification i Support 

3.4 Hardware Development Procedure 

3.5 I/O Software Development 

3.6 Algorithm Software Development 

3.7 Verification Tool Development 

3.8 Test Stand Integration 

3.9 Program Management 



Task 4 subsystem III: High Frequency Raw FRS 


4.1 Near Term Technology Development 

4.2 Preliminary Design 

4.3 Algorithm Verification t Support 

4.4 Hardware Development Procedure 

4.5 1/0 Software Development 

4.6 Algorithm Software Development 

4.7 Verification Tool Development 

4.8 Test Stand Integration 

4.9 Program Management 



* * 1 month 

FIG. 5.1 HMS IMPLEMENTATION PLAN SCHEDULE 
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Task 5 Subsystem IV: High Frequency Raw CADS 


5.1 Near Term Technology Development 

5.2 Preliminary Design 

5.3 Algorithm Verification fc Support 

5.4 Hardware Development ProceAjre 

5.5 I/O Software Development 

5.6 Algorithm Software Development 

5.7 Verification Tool Development 

5.8 Test Stand Integration 

5.9 Program Management 


1 ************ 

1 1 

1 ************ 1 ******* 




*** 1 








I MM 
1 

|************| 



*** | ************ | ************ | 


Task 6 Subsystem V: Vibration 


6.1 Near Tern Technology Development ,••**•***< 


1 


6.2 Preliminary Design | 

6.3 Algorithm Verification & Support | 

6.4 Hardware Development Proceckire | 

6.5 1/0 Software Development | 

6.6 Algorithm Software Development | 

1 ** f 

1 ****** | ********* 

< ****** | **** 

j ****** | ********* 

a ****** | ********* 

1 

1 

1 

1 

1 


6.7 Verification Tool Development | 

1 *** 1* 

1 


6.8 Test Stand Integration | 

1 1 



6.9 Program Management | 





Task 7 Subsystem VI : Hear Term ATD Sensor 


7.1 Near Term Technology Development 

7.2 Preliminary Design 

7.3 Algorithm Verification t Support 

7.4 Hardware Development Procedure 

7.5 I/O Software Development 

7.6 Algorithm Software Development 

7.7 Verification Tool Development 

7.8 Test Stand Integration 

7.9 Program Management 


************ 


*** | ************ 
*** i************ 



* * 1 month 


FIG. 5.1 HMS IMPLEMENTATION PLAN SCHEDULE 
(continued) 
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Fig. 5.2 HMS WORK BREAKDOWN STRUCTURE 
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workstation/minicomputer will provide the following non time-critical functions: 

-user interface 
-system task manager 
-off line analysis 
-database 
-database manager 
-communications link 

The critical real-time health monitoring and data logging functions will be implemented in a baseline 
HM System on a commercial bus architecture to be integrated with the workstation. The baseline system will 
consist of the workstation and VME cards for the following: CADS serial link interface and CPU; a main 
system CPU; mass storage system; VME busses. 

A Development Tfest System (DTS) will be acquired and modified for use as an HMS verification tool. 

Task 1.4 HO Software Development Process. — An I/O interface requirement specification (IRS) and a 
software requirement specification (SRS) will be generated. Following these, detailed software will be 
designed, coded in ADA, and downloaded into the system. Hardware/Software integration and testing will be 
performed using the DTS. A system test plan will be developed and test procedures written for the complete 
system verification and test prior to teststand integration. 

Task 1.5 Algorithm Software Development. —The algorithmic techniques of Tksk 1.2 will be implemented 
within the software structure of the HMS. Software for the fault detection algorithms will be developed and 
coded in ADA per an algorithm IRS and SRS. 

Task 1.6 Verification Tool Development.— The verification process is an integral portion of the HMS 
Implementation Plan. It is not appropriate to simply validate system operation at the teststand. Once the 
teststand integration phase is reached, any problems incurred will have a substantial impact upon the 
program in terms of cost and scheduling. Therefore, a significant amount of resources and effort have been 
allocated to the development of verification tools so that hardware, software, and algorithms for the baseline 
HMS and subsystems will be verified prior to system integration at the teststand. 

Appropriate hardware and software integration tools will be selected for development. The HMS 
verification tools will include such tools as ADAS, the DTS, and the COMDISCO system, described below. 

ADAS will be employed to analyze the databus and CPU loadings and provide an optimal architecture 
for the HMS and Data Logging System. ADAS maps software data flow graphs onto the hardware set and 
produces a hardware connectivity graph which optimizes the design. 

The Development Test System (DTS) is a stand alone system to be used for component/subcomponent 
bench testing and system integration testing of the complete hardware and software packages. The DTS will 
simulate the inputs to the HMS. Software enhancements to the DTS will be supported under this task. These 
enhancements might include, for example, the capability to simulate sensor loss/failure in order to validate 
teststand integration, as well as to test algorithm robustness. 

COMDISCO will be used in the development, testing, and verification of the fault detection algorithms. 
The COMDISCO system is a workstation based design environment which allows one to model signal 
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processing systems and to simulate and fine tune DSP algorithms. The fault detection algorithms will be 
simulated on the COMDISCO system, and deficiencies will be highlighted and corrected. 

Task 1. 7 Test Stand Integration and Operation. — Tfeststand integration and support of the Baseline CADS 
Serial Link HMS will be provided for at both the teststand and at UTRC. UTRC will train one person in the 
operation of the HMS. 

Integration and operation of the baseline hardware at the teststand will begin 10 months after program 
start. The workstation and Baseline CADS Serial Link hardware and Data Logging system will be integrated 
on the teststand and perform data logging functions while the HMS algorithms are still in their 
verification/implementation phase. Not only will this provide for a smoother integration of the baseline HMS 
and Subsystems and algorithm support via the data logging, but it will also provide essential feedback which 
can impact the development process. Teststand integration and operation of the complete Baseline HMS will 
begin 24 months after program start. 

Task 1.8 Program Management. —This task includes the effort to plan and direct the project as well as the 
preparation and attendance at various levels of reviews. The task also includes the preparation of 
documentation and reports. The traditional management tasks of high visibility, rapid problem solving, and 
accurate program control will be followed rigorously. Special emphasis will be placed upon interaction with 
NASA personnel between program tasks. 

Task 1.9 Near Term Technology Development. —’Expert Systems techniques will be developed and 
incorporated to manage the diagnostic information from the baseline HMS and subsystems. A high level 
automatic decision making process will utilize a priori knowledge about SSME and LRU operation, as well as 
incorporate the diagnostic information from the subsystems. The a priori knowledge could include, for 
example, information such as turbopump efficiencies, component wear, engine test history, and teststand 
conditions. The decision making scheme will use expert system rules to determine engine health. 

5.2 Subsystem Description 

A modular HMS has been proposed for its capability of phased implementation and its flexibility. With 
this in mind, six subsystems have been defined as HMS options which can be added onto the baseline system. 
These subsystems may be selected individually, or in combination, to provide additional HMS functionality 
and capability. It is stressed that these subsystems are not to be implemented as stand-alone systems, but 
rather as additions onto the baseline HMS. Some subsystems require development efforts covered by the 
Near Tferm Technology Program to enable their implementation. Each subsystem selected will be developed 
according to each of the subtasks described under the baseline system. 

5.2.1 Task 2. Subsystem I: Low frequency Facility Data Link. —The Low Frequency Facility Data Link will 
interface to the Facility Recording System (FRS) as it currently exists and provide the HMS with the 
conditioned facility sensor data. This subsystem will employ the fault detection algorithms utilized in the 
baseline CADS Serial Data Link, and therefore require minimal technology development. An optimal subset 
of FRS sensors will be selected for the implementation. The algorithms will be run on data from a number of 
SSME nominal and failure tests to verify their operation and performance, as well as to tweak algorithm 
constants, confidence intervals, and thresholds. False alarm rates will be assessed and quantified. The 
hardware required for this subsystem will include VME cards for the FRS data interface and the subsystem 
CPU. 
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Teststand integration of the Low Frequency Facility Data Link will begin 24 months after program start. 

5.2.2 Task 3. Subsystem II: Plume Spectroscopy.— A plume spectroscopy system and its associated 
processing algorithms are assumed to be developed outside of the scope of this program. A plume 
spectrometer detects infrared emission and absorption of ionized species within the SSME plume. The 
spectral lines of these species can be correlated to internal engine erosion and wear. The output of a ’sensor’ 
will be the intensity of preselected spectral lines which correspond to the species of interest. 

The Optical Plume Anomaly Detector System, OPADS, will exist at the teststand and interface with the 
HMS via a VME card. Extensive testing will be required to develop a database with sufficient detail to identify 
intensity patterns which correlate with faults. Expert Systems techniques, developed under the Near Term 
Technology Development task, will be used to integrate the additional diagnostic capabilities into the HMS. 

Tfeststand Integration of the Plume Spectroscopy Subsystem will begin 24 months after program start. 

5.2.3 Task 4. Subsystem III: High Frequency Raw Facilities Data Link. —The HMS will interface directly to a 
specified set of Facility data lines before the signals are conditioned by the Facility Recording System. This 
will provide the HMS with high frequency, unconditioned data which will contain more diagnostic information 
than that which is conditioned and recorded by the FRS. The HMS interface will be isolated such that the 
integrity of the Facility data will be maintained. A subset of the FRS data lines will be selected for 
implementation in the HMS failure detection algorithms. 

The hardware required for this subsystem will include VME cards for the interface, A/D and signal 
processing, and CPU(s). The number of signal processing cards and CPUs required will depend upon the 
number of facility sensors selected. Signal processing and fault detection algorithms will be developed under 
the Near Term Technology Development task. Hardware and Software modules will be implemented, tested, 
and verified using the DTS, ADAS, and COMDISCO prior to teststand integration and testing. 

Teststand integration of the High Frequency Raw Facilities Data Link Subsystem will begin 30 months 
after program start. 

5.2.4 Task 5. Subsystem IV: High Frequency Raw CADS Data Link. — The High Frequency Raw CADS Data 
Link HMS Subsystem will follow the high frequency Facility effort, and will be similar in function to the Raw 
Facilities Data Link, and therefore cannot be implemented without its predecessor. The HMS will interface 
directly to a specified subset of the CADS data lines, before the signals are conditioned by the SSME 
controller. This will provide the HMS with high frequency, unconditioned data which will contain more 
diagnostic information than that which is conditioned by the controller. The HMS interface will be isolated 
such that the integrity of the CADS data entering the controller is maintain ed. 

The hardware required for this subsystem will include VME cards for the interface, A/D and Signal 
Processing, and CPU(s). The fault detection algorithms developed for Subsystem HI, the High Frequency Raw 
Facilities Data Link, will be utilized in this subsystem’s implementation and therefore will require minimal 
technology development effort. Hardware and Software modules will be implemented, tested, and verified 
using the DTS, ADAS, and COMDISCO prior to teststand integration and testing. 

Teststand integration of the High Frequency Raw CADS Data Link Subsystem will begin 36 months after 
program start. 
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5.2.5 Task 6. Subsystem V. Advanced Vibration Analysis.— The Advanced Vibration Subsystem for the 
HMS will interface to the existing accelerometer suite to obtain vibration data. The algorithms for the 
Vibration Subsystem of the HMS will not duplicate the efforts of FASCOS, but will involve more advanced 
vibration signature analysis. 

The hardware required for this subsystem will include VME cards for the interface, A/D and signal 
processing, and CPU. Signal processing and fault detection algorithms will be developed under the Near Term 
Technology Development task. Hardware and Software modules will be implemented, tested, and verified 
using the DTS, ADAS, and COMDISCO prior to teststand integration and testing. 

Teststand integration of the Vibration Subsystem will begin 33 months after program start. 

5.2.6 Task 7. Subsystem VI: ATD Sensor.— A number of advanced sensing technologies which provide 
additional diagnostic information about SSME health are being developed under the Alternate TUrbopump 
Development (ATD) Program. The technologies being considered for the HMS subsystem are. 

-Acoustic Emission 
-Fiber Optic Deflectometer 
-Optical Pyrometer 

Acoustic Emission sensors monitor high frequency stress waves which result from the interaction of 
bearing components. Features extracted from time and frequency domains are analyzed by correlation and 
neural network techniques to identify the state of operation of the bearing. As both time and frequency 
analysis are required for the AE techniques, a dedicated signal processing group must be developed. 

Fiber optic deflectometers use light reflections to measure outer race deflections due to bearing passage 
in order to quantify bearing and race condition. Frequency analysis is required to extract bearing component 
harmonics. These highly computational algorithms must be developed on dedicated hardware to provide 
real-time diagnostic information. 

Fiber optic probes with indium-gallium-arsenide detectors measure radiant energy from turbine blades 
during engine operation to provide a linear map of blade temperature. As computational requirements are 
high, special dedicated processing will be developed. 


Note that the signal processing and fault detection algorithms associated with each sensor will be 
developed outside of the scope of the HMS Program, most likely as part of the ATD Program. 


Development of this subsystem requires that the production ATD pumps incorporate one or more of the 
above mentioned sensors. As the ATD Program continues, these sensors will be evaluated for their 
implementation feasibility. The Near Tferm Tfechnology Development task will support the modifications to 
fault detection and signal processing algorithms which will be required to include the sensors) as part of the 
HMS. 
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The hardware required for this subsystem, though somewhat dependent upon the sensing technologies 
selected, will include VME cards for the sensor interface^), A/D and signal processing, and CPU(s). 
Hardware and Software modules will be implemented, tested, and verified using the DTS, ADAS, and 
COMDISCO prior to teststand integration and testing. 

Teststand integration for the Near Term ATD Sensor Subsystem will begin 36 months after program 

start. 
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SECTION 6.0 
CONCLUSIONS 


UTRC has developed a focused HMS framework design for the SSME. The UTRC HMS framework 
integrates fault detection algorithms with proven sensor technologies to provide monitoring during all phases 
of the SSME operation. Three fault detection algorithms, ARMA, RESID, and clustering, form the proven 
core of a hierarchial decision process that was mapped onto a state-of-the-art, off-the-shelf hardware 
architecture capable of real-time operation. Key elements of the UTRC HMS framework are: 


• All phases of SSME operation covered; 

• Three algorithmic approaches used to cover faults with different manifestations; 

• Algorithms demonstrated 100% detection of faults for the test database using only 
. CADS sensor information; 

• Low preliminary false alarm rate; 

• Robust to sensor loss; 

• Minimal algorithm complexity; 

• Modular hardware architecture provides flexibility, reliability, and maintainability 
while allowing realtime operation of the fault detection algorithms; 

• Phased implementation provides near term benefits; 

• Clear migration path to a Flight HMS established in the hardware design. 


UTRC has demonstrated the feasibility of a focused HMS that can provide immediate benefits on the 
SSME tests tand. A low-risk, phased implementation plan will provide near term enhancements to safety 
while allowing the incorporation of advances in algorithm and sensor technologies as they become available. 
The successful demonstration by UTRC of the essential fault detection strategies along with a viable hardware 
design provides the necessary framework from which the SSME HMS can can be implemented. 
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APPENDIX A 

FAILURE DETECTION ALGORITHM RESULTS 

TOs appendix contains a description of each of the sixteen SSME tests front which data were available 
for analysis ^and also a description of the results of the HMS failure detection algorithms. Descnptions of the 
Wlut Se^s «re taken directly from the Rocketdyne SAFD Phase ff Report [1J; no efforts to tariter 

explain the causes and propagation of the failures were taken. A plot of the MCC -^ f ! 

as the Main Combustion Chamber Pressure is proportional to the engine power level, and thus 

the test thrust profile. (Note: an MCC_PC of 3006 psia = 100% RPL). 

A discussion of the failure detection algorithm results follows the failure *”^*^ e nce sumnrmiy. Depending 
upon the failure and the PIDs available in the test dataset, one, two, or all three 
algorithms may have been applied. For each test, the results and supporting figures of data and ou p 
presented for each appropriate algorithm. 

Regression analysis was appropriate during startup and shutdown sequences. A plot of the error 
between tite predicted and actual MCC.PC is shown. Time series analysis failure detection 
appropriate during mainstage operation at a steady power level. Data for suspect parameters, as well as their 
corresponding ARMA error signal correlation functions, are shown. 

The Cluster algorithm was appropriate during mainstage operation. Table A-l lists the missing PIDs for 
each test. A plot of the results (correlation coefficient) of running the cluster algorithm, wit a emp a 
reduced sensor set, on nominal SSME test data is presented. This is to show fiiat no false 
nominal engine test with the selected sensor set and detection thresholds. Finally, a plot of the Cluster 
algorithm results for the failure data is shown, along with the fault detection threshold. 

1. Test 901-110: HPOTP LOX Seal Burn 

According to Rocketdyne SAFD Phase II report[l], during stable operation at75% of rated power level, 
the engine controller issued a cutoff command when a fire occurred in the HPOTP. The fire started in the LOX 
primary seal drain cavity. The exact cause of the fire could not be positively determined, however, nine sources 
were determined to have the potential of causing the ignition. These are listed below: 

1) Loss of hydrodynamic lift resulting in rubbing of the primary oxidizer seal against the mating ring, 
creating enough heat to initiate burning; 2) Primary oxidizer seal bellows weld failure allowing oxygen leakage; 
3) Ignition at the interface of the bellows and its vibration damped as a result of friction; 4) Contamination in 
the primary oxidizer seal area; 5) Rubbing of the primary oxidizer seal due to changing phase (hquid to gas), 6) 
Effects ofhot gas leakage past the intermediate seal into the primary oxidizer seal cavity; 7) Rubbing of the 
primary oxidir seal against the mating ring due to mating ring vibration; 8) Uakage of hot gas containing 
hydrogen past the intermediate seal into the primary oxidizer seal cavity, creating a combustible mixture and 
9) Other leak paths allowing communication between the drain systems. (Tfest conducted on 24 March 1977, 

cutoff time: t = 74.1 seconds.) 

CADS Data : A plot of the MCC_PC, shown in Figure A-l.l, depicts the engine power profile 
for the test. 
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TABLE A-1 CLUSTER ALGORITHM FAULT DETECTION RESULTS 


MISSING PIDS 

EVENT DETECTION 

FAULT DETECTION 

TIME OF 


THRESHOLD 

THRESHOLD 

DETECTION 

PC CMD MISSING 

— 

— 

wmmmwm 

NONE 

.89 

5/5 

302.4 

18 

.89 

5/5 

42.7 

18, 225, 226 

.89 

5/5 

8.6 

231,232,56,260,261, 

.68 

5/5 

♦ 

52,225,226,32 




NONE 

.90 

5/5 

5.8 

58 

.89 

5/5 

5.2 

231,232 

.92 

5/5 

255.6 

58 

.89 

5/5 

300.2 

232,59,18,261,266 

.74 

5/5 

5.2 

52 

.89 

5/5 

101.5 

58,231,232,18,52,53 

.93 

5/5 

102.1 

58,231,232,59 

.67 

5/5 

50.2 

MAINSTAGE NOT 

ACHIEVED, FAILURE 

DURING STARTUP 

— 

58,231,232,59 

.94 

5/5 

405.5 


CORRUPTED DATA 


section A-5 
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Fig. A-1.1 MCC PRESSURE (PID NO. 130) FOR TEST 901-110 
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-1.2 LPFT DISCHARGE PRESSURE (PID NO. 86) FOR TEST 901-110 
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Fig. A-1.3 HPFT DISCHARGE PRESSURE (PID NO. 52) FOR TEST 901-110 
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Correlation Function 




Fig. A-1.9 FAILURE DETECTION USING ARMA MODELS FOR PARAMETER 
LPFT DISCHARGE TEMPERATURE FOR TEST 901-110. 
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Time Series Analysis : During the mainstage operation at 75% RPL, abnormal behavior was 
detected around 16 seconds from the start for a number of parameters. Figures A-1.2 through 
A-1.9 show the data and corresponding ARMA error signal correlation function plots. 


Cluster Analysis : The clustering algorithm was not run on this data because the PC command 
data was not included in the CADS data set. This CADS parameter is required for all sensor 
data normalization. 


2. Test 901-436: HPFTP Coolant Liner Buckle 

According to the Rocketdyne SAFD Phase II report, during stable operation at 109% of rated power 
level, the following series of events occurred within the HPFTP : (1) pieces from the interstage seal pass 
through the 2nd stage platform gap, decreasing the 2nd disc cavity pressure and increasing the seal stack 
leakage into the coolant liner at approximately t = 598.5 seconds from start; (2) an interstage seal piece lodges 
in the 2nd stage shank, increasing the 2nd platform seal gap and exciting 12 stiffener vanes per revolution at 
t = 607 seconds; (3) the coolant liner begins to buckle at t - 610.35 seconds, and, (4) the T/A (turn around) 
sheet metal begins movement, reducing the flow area at t = 610.44 seconds. Att = 611.06 seconds, the test was 
shutdown due to a High Pressure Fuel Tbrbine (HPFT) discharge temperature redline. (Test conducted on 14 
February 1984, cutoff time: t = 611.06 seconds.) 

% 

CADS Data : A plot of the MCC_PC is shown in Figure A-2.1. During the mainstage 
operation, fuel venting and propellant transfer occurred at 10 seconds from the start. Figure 
A-2.2 shows the effect of fuel venting on the HPFPINPR. 

Time Series Analysis : For this test, the nominal ARMA models for parameters such as 
FPB PC, HPFP IN PR, or MCC CLNT DS PR over a 4 second window (100 data points) 
did not indicate any failures till the redline cutoff. However, nominal ARMA models over a 
longer time of 40 seconds (1000 data points) were effective in detecting deviations from the 
nominal, beginning around 30 seconds from the start, due to gradual drifting in parameter 
values. Because of the 1000 point window, failure detection was indicated at 70 seconds. 

Figures A-2.3 through A-2.7 show the data and corresponding ARMA error signal correlation 
function plots. 

Cluster Algorithm : Input to the clustering algorithm consisted of the thirteen selected sensors 
(see Table 3.2a) minus those missing sensors listed in Table A-l. The event threshold was 0.89 
and the fault detector was set for a five out of five event threshold. 

The performance of the algorithm on the nominal data set 902-463 is shown in Figure A-2.8. 
Deviations from the correlation coefficient, R, value of 1 occur when the engine power drops to 
65% RPL; when the engine is transitioning from 104% to 109% RPL; and when the engine is 
transitioning from 65% to 100% RPL prior to shutdown. All R values remain above .89, and 
thus no false alarms occur during the test. 

The correlation values for 901-436 and the detection threshold are shown in Figure 2.9. At the 
start of mainstage operation, the R values are above the threshold. While maintaining a 
constant power of 109% RPL, the R values decrease until fault detection at 302.4 seconds. 
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Fig. A-2.1 MCC PRESSURE (PID NO. 130) FOR TEST 901-436 



Fig. A-2.2 


EFFECT OF VENTING ON HPFP INLET PRESSURE 
(PID NO. 86) AT 10 SECS FOR TEST 901-436 
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Fig. A-2.7 FAILURE DETECTION USING ARMA MODELS FOR PARAMETER 
MCC COOLANT DISCHARGE PRESSURE FOR TEST 901-436. 
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Fig. A-2.8 THE CLUSTERING ALGORITHM RESULTS SHOW NO FALSE ALARMS 
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3. Test 901-364: HPFTP Kaiser Nut Failure 


According to the Rocketdyne SAFD Phase II report, during stable operation at 109% of rated power 
level, the test shutdown prematurely due to a LOX preburner pump radial accelerometer redline. The 
probable cause of the failure was a new HPFTP thermal shield retainer nut assembly, used for the first time on 
this test. The geometry of the nut allowed a direct leak path through the heat shield for the high temperature 
ASI gas, producing two jets which impinged directly upon the turbine end cap (Kaiser helmet) and reduced 
material properties in the impingement zone. The sequence of failure follows: 1) A breach in the Kaiser helmet 
occurs from a combination of heat shield vibration-induced loads, pressure differential across the thickness 
of the Kaiser helmet, and material degradation and fatigue; 2) The hot gas interrupts coolant flow and heats 
the turbine and bearings; 3) Heating produces an increase in bearing stiffness, which causes increasing 
synchronous vibrations; and 4) Synchronous vibration continues to build up until bearing failure occurs, 
followed by large rotor displacement, severe blade rubbing, and eventual blade breakage, turbine seizing, fuel 
flow stoppage, rupture of the pump inlet volute, and finally a severe fire caused by the resulting LOX-rich 
shutdown, (list conducted on 7 April 1982, cutoff time: t = 392.15 seconds.) 

CADS Data : A plot of the MCC_PC is shown in Figure A-3.1. During the mainstage 
operation, fuel venting occurred at 100 seconds from the start, and LOX tank pressurization 
occurred at 200 seconds. Figure A-3.2 shows the effect of fuel venting on the HPFP_IN_PR. 

Time Series Analysis : During the mainstage operation at 109% RPL, abnormal behavior is 
detected around 210 seconds from the start for a number of parameters. Figures A-3.3 
through A-3.8 show the data and corresponding ARMA error signal correlation function 
plots. 

Cluster Algorithm : Input to the clustering algorithm consisted of the thirteen selected sensors 
(see Table 3.2a) minus those missing sensors listed in Table A-l. The event threshold was 0.89 
and the fault detector was set for a five out of five event threshold. 

The algorithm performance for the nominal data set 902-463 is shown in Figure A-3.9. As 
shown in the plot, deviations in the correlation coefficient, R, values occur when the engine 
power drops to 65% RPL; when the engine is transitioning from 104% to 109% RPL; and when 
the engine is transitioning from 65% to 100% RPL prior to shutdown. No false alarms 
occurred for the given threshold. 

The correlation values for 901-364 and the detection threshold are shown in Figure A-3.10. At 
the start of mainstage operation, the R values are above the threshold. When the engine 
transitions from 109% to 90% RPL, the R values descend below the detection threshold, 
causing a fault detection at 42.7 seconds. The R values increase again when the engine 
accelerated to 109 % RPL, but while at a steady power level, the R values decrease until the 
detection threshold is crossed again at 130 seconds. 

4. Test 901-307: FPB LOX Post Fracture 

According to the Rocketdyne SAFD Phase II report, this test was one of several designed to determine 
the minimum LOX level upstream of the LPOP (i.e., minimum NPSH) with which the pump could operate 
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without overspeed. The test terminated as designed with a redline cutoff at the elevation-level of the LPOP 
inlet duct. During operation at 109% rated power level, a High Cycle Fatigue (HCF) through crack developed 
at the fuel prebumer’s injector LOX post/element A-8. The fuel mixed with the LOX through this crack, 
ignited, and burned the LOX post tip. Additional damage to the fuel sleeve and faceplate followed. After 
cutoff initiation, the GH2 backflowed and ignited the residual LOX within the dome, causing the remaining 
damage. (Test conducted on 28 January 1981, cutoff time: t = 75.0 seconds.) 

CADS Data : A plot of the MCC_PC is shown in Figure A-4.1. The CADS data do not show 
operation at 109% power level as stated by the SAFD report. According to MCC_PC plot, the 
mainstage begins at 100% power level and then drops down to 65% power level until the 
redline cutoff around 75 seconds from the start. 

Time Series Analysis : During the mainstage operation at 100% RPL, abnormal behavior is 
detected around 7.5 seconds from the start for a number of parameters. Figures A-4.2 and 
A-4.3 show the data and corresponding ARMA error signal correlation function plots. 

Cluster Algorithm : Input to the clustering algorithm consisted of the sensors (see Table 3.2a) 
minus those missing sensors listed in Tkble A-l. The event threshold was 0.89 and the fault 
detector was set for a five out of five event threshold. 

The performance of the algorithm on the nominal data set 902-463 is shown in Figure A-4.4. 

As shown in the plot the significant deviations in the correlation coefficient, R, values occur 
when the engine power drops to 65% RPL; when the engine is transitioning from 104% to 109% 

RPL; when the engine is transitioning from 65% to 100% RPL, prior to shutdown. No false 
alarms occurred for the given threshold. 

The correlation values for 901-307 and the detection threshold are shown in Figure A-4.5. The 
engine is initially at 100% RPL, and transitions to 65% RPL ten seconds from start. The R 
values are initially well above the detection threshold, but rapidly begin to decrease. At 8.6 
seconds, the R values have crossed the threshold, 1.4 seconds before the power transition 
starts, causing a fault detection to be declared. Following the power transition to 65% RPL, the 
R values continue to gradually decrease until engine shutdown. 

5. Test SF10-01: FPB Injector Erosion 

According to the Rocketdyne SAFD Phase II report, during 102% of rated power level operation, this 
test was terminated when fire detectors and hazardous gas detectors triggered in the aft fuselage. Based on a 
review of the movie films, the digital data, pre-test and post-test hardware inspections, and on previous 
experience, the most probable cause of the failure was an erosion of the fuel prebumer injector element H-13 
during the start transient followed by slag deposits in the fuel annulus in the sector adjacent to the liner wall. 
The resultant higher mixture ratio in the outer zone in combination with the large (.042 to .045 inches) liner end 
cap gap for this prebumer (allowing hot combustion gas to flow behind the liner, diluting the coolant gas), then 
caused the bumthrough of the liner and, subsequently, the prebumer body. Whether or not contamination 
played a role in the initiation of the erosion is conjecture. However, the deflection of the faceplate created a 
fuel annulus gap which was smaller than the fuel element orifices (.018 in.) designed to protect the annulus 
from contamination. (Tfest conducted on 12 July 1980, cutoff time: t = 104.8 seconds.) 
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Fig. A-4.1 MCC PRESSURE (PID NO. 130) FOR TEST 901-307 



Fig. A -4.2 LPOP DISCHARGE PRESSURE (PID NO. 209) FOR TEST 901-307 
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Fig. A-4.3 FAILURE DETECTION USING ARMA MODELS FOR PARAMETER 
LPOP DISCHARGE PRESSURE FOR TEST 901-307. 
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Fig. A -4.4 THE CLUSTERING ALGORITHM RESULTS SHOW NO FALSE ALARMS 

FOR THE 902-463 NOMINAL DATA USING THE 901-307 SENSOR SUBSET. 
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Fig. A-4.5 THE CLUSTERING ALGORITHM RESULTS FOR TEST 901-307. 
FAULT DETECTION OCCURED AT 8.6 SECONDS. 
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CADS Data : A plot of the MCC_PC, shown in Figure A-5.1, shows the engine power profile 
for this test. Several CADS sensors for this data set were bad or missing. 

Time Series Analysis : The CADS Data do not have sensor measurements during the redline 
cutoff time period. The ARMA models did not indicate failure during the mainstage 
operation, nor could they detect the failure at redline cutoff because of lack of data. 

Cluster Algorithm : Input to the clustering algorithm consisted of the thirteen selected sensors 
(see Tkble 3.2a) minus those missing sensors listed in Tkble A-l. The event threshold was 0.68 
and the fault detector was set for a five out of five event threshold. 

The performance of the algorithm on the nominal data set 902-463 is shown in Figure A-5.2. 

Unlike the previous tests, SF10-1 had only four sensors available for input. The other sensors 
were either not in the data set, corrupted or removed due to venting effects. The results of using 
a small input set is apparent in the plot of the correlation coefficient, R, values. Transitions in 
power now cause significant deviations in the R values, leading to false alarms at 28 seconds, 52 
seconds, and 517 seconds. 

The correlation values for SF10-1 and the detection threshold 0.68 are shown in Figure A-5.3. 

The engine performs several power transitions, and the R values are below the threshold at 5.1 
seconds, 20 seconds, 41 seconds, and 48 seconds. Detections were declared for each these 
crossings. Because the reduced sensor set grossly affected the performance of the algorithm 
during power transitions, the detections resulting from transitions were removed as possible 
detections times and the detection at 48 seconds was listed as the detection time. 

6. Test 902-198: Main Injector LOX Post Fracture 

According to the Rocketdyne SAFD Phase II report, during stable operation at 102% of rated power 
level, LOX post 61, row 12 cracked through between the primary and secondary faceplate. Test data analysis 
revealed that the LOX post failure occurred first, and subsequently did major damage to the injector. The loss 
of fuel through the primary faceplate and from the ruptured nozzle tubes resulted in a oxidizer rich condition 
in the oxidizer prebumer, and led to a HPOT discharge temperature redline cutoff att = 8.5 seconds. (Tfcst 
conducted on 23 July 1980, cutoff time: t= 8.5 seconds.) 

CADS Data : A plot of the MCC_PC, shown in Figure A-6.1, displays the engine power profile 
for the test. 

Time Series Analysis : During this test, the mainstage phase lasted for less than 4 seconds 
before the redline cutoff was initiated. Since the ARMA models require a window of 4 
seconds, the redline cutoff coincided with the ARMA model failure indications. 

Cluster Algorithm : Input to the clustering algorithm consisted of the thirteen selected sensors 
(see Tkble 3.2a) minus those missing sensors listed in Tkble A-l. The event threshold was 0.89 
and the fault detector was set for a five out of five event threshold. 

The performance of the algorithm on the nominal data set 902-463 is shown in Figure A-6.2 As 
shown in the plot, deviations in the correlation coefficient, R, value occur when the engine 
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Fig. A-5.2 THE CLUSTERING ALGORITHM RESULTS SHOW FALSE ALARMS OCCURING 
AT 28, 52 AND 517 SECONDS FOR THE 902-463 NOMINAL DATA USING 
THE SF10-01 SENSOR SUBSET. 


135 





CORRELATION 



TIME (MM) 


Fig A-5.3 THE CLUSTERING ALGORITHM RESULTS FOR TEST SF10-01. 
FAULT DETECTION OCCURED AT 48 SECONDS. 
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Fig. A-6.1 MCC PRESSURE (PID NO. 130) FOR TEST 902-198 



Fig. A-6.2 THE CLUSTERING ALGORITHM RESULTS SHOW NO FALSE ALARMS 

FOR THE 902-463 NOMINAL DATA USING THE 902-198 SENSOR SUBSET. 
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power drops to 65% RPL; when the engine is transitioning from 104% to 109% RPL; and when 
the engine is transitioning from 65% to 100% RPL prior to shutdown. No false alarms 
occurred for the given thresholds. 

The correlation values, R, for 902-198 and the detection threshold are shown in Figure A-6.3. 
Initially the R values are above the detection threshold but rapidly begin to decrease and cross 
the threshold at 5.6 seconds causing a fault detection. The Rvalues remain below the threshold 
until 6.2 seconds, and return to their previous levels. Unlike the previous failures, the detection 
for this test results from a transient event in the system. 

7. Test 902-249: HPFTP Turbine Blade Failure 

According to the Rocketdyne SAFD Phase II report, during stable operation at 109% of rated power 
level, the test shutdown prematurely due to a HPFTP accelerometer redline, and the associated massive failure 
of the HPFT first stage turbine blade. The sequence of events leading to the blade failure follows: 

1. Initial turbine damage at t = 3.0 seconds. The FPB injector nonuniform flow condition experienced 
in at least two previous test may have persisted (despite rework) and worsened. 

2. Engine fuel inlet temperature increases and the high pressure fuel pump begins to cavitate at t = 
108.0 seconds. The temperature increase was brought about by propellant transfer. The increase lowers the 
fuel density causing an increase in HPFP volumetric flowrate, speed, and power necessary to hold thrust 
constant. As the flow and speed increase, the HPFP approaches the conditions at which the suction capability 
of the hardware is exceeded, and cavitation starts. Once cavitation is initiated the efficiency of the pump 
degrades, causing an increase in the pump speed required to maintain pump output and hold thrust constant, 
causing worsening cavitation conditions and an increase in HPFT inlet temperature. 

3. Kel-F rub ring flexes and melts at t = 374 seconds. The released Kel-F particles plug nozzle tubes 
causing them to rupture, contributing to the HPFT inlet temperature increase. 

4. The first stage turbine blade failures at t = 450.52 seconds. (Test conducted on 21 September 1981, 
cutoff time: t = 450.58 seconds.) 

CADS Data : A plot of the MCC_PC is shown in Figure A-7.1. During this test, LOX side 
venting occurred at 20 seconds from the start, and propellant transfer occurred at 100 seconds 
from the start. Figure A-7.2 shows the LPOP_DS_PR with possible effects due to LOX 
venting. Figures A-7.3 and A-7.4 show the HPFP speed and the HPFP inlet temperature 
respectively. 

lime Series Analysis : During the mainstage operation at 109% RPL, abnormal behavior was 
detected at approximately 160 seconds from the start for some parameters. Figures A-7.5 
through A-7.7 show the ARMA error signal correlation function plots. 

Cluster Algorithm : Input to the clustering algorithm consisted of the thirteen selected sensors 
(see Table 3.2a) minus those missing sensors listed in Thble A-l. The event threshold was 0.89 
and the fault detector was set for a five out of five event threshold. 
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Fig. A-6.3 THE CLUSTERING ALGORITHM RESULTS FOR TEST 002-108. 
FAULT DETECTION OCCURED AT 5-8 SECONDS. 
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Fig. A-7.1 MCC PRESSURE (PID NO. 130) FOR TEST 902-249 



Fig. A-7.2 LPOP DISCHARGE PRESSURE (PID NO. 209) FOR TEST 902-249 
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Fig. A-7.3 HPFP SPEED (PID NO. 260) FOR TEST 902-249 



Fig. A-7.4 HPFP INLET TEMPERATURE (PID NO. 226) FOR TEST 902-249 
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Fig. A-7.5 FAILURE DETECTION USING ARMA MODELS FOR PARAMETER 
LPOP DISCHARGE PRESSURE FOR TEST 902-249. 



Fig. A-7.6 FAILURE DETECTION USING ARMA MODELS FOR PARAMETER 
HPFP SPEED FOR TEST 902-249. 
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Flfl. A-7.7 FAILURE DETECTION USING ARMA MODELS FOR PARAMETER 
HPFP INLET TEMPERATURE FOR TEST 902-249 
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The performance of the algorithm on the nominal data set 902-463 is shown in Figure A-7.8. 
Deviations in the correlation coefficient, R, values occur when the engine power drops to 65% 
RPL; when the engine is transitioning from 104% to 109% RPL; and when the engine is 
transitioning from 65% to 100% RPL prior to shutdown. No false alarms resulted from the 

deviations. 

The correlation values for 902-249 and the detection threshold are shown in Figure A-7.9. The 
R values start below the detection threshold causing a detection to occur at 5.2 seconds. The 
low R values concur with the engine failure scenario which indicated flow problems within the 
pump and turbine blade failure at 3.0 seconds. 

8. Test 901-225: MOV Fretting 


According to the Rocketdyne SAFD Phase II report, during stable operation at 100% of rated power 
level the Voting Logic Cutoff Device initiated a shutdown when the HPFT discharge temperature redline was 
exceeded. Failure analysis indicates the incident was caused by fretting at the main oxidizer valve inlet 
sleeve-to-bellows flanged joint, which resulted in the initiation of a fire within the MOV Row oscillations at 
four times the high pressure oxidizer turbopump speed caused sufficient excitation of the MOV sleeve to 
overcome the retention screw preload, and allowed fretting between the bellows mating surfaces and shims^ 
The heat generated by fretting produced ignition of the LOX environment. Metal combustion of the MOV 
caused an over pressure at the valve which increased the initial LOX flow to the main injector and raised the 
back pressure to the high Pressure oxidizer turbopump (HPOTP). Hie back pressure increase ^prated the 
HPOTP turbine power and resulted in an increase of LOX to the fuel prebumer causing the HPFT discharge 
temperature to exceed its redline. (Test conducted on 27 December 1978, cutoff time t = 255.61 seconds.) 


CADS Data: A plot of the MCC_PC is shown in Figure A-8.1. During this test at 100% RPL 
operation, one fuel flowrate PID, out of a total of 4 fuel flowrate PIDs, showed presence of high 
levels of noise, which affected the MCC.PC. Figures A-8.2 through A-8.5 show the fuel 
flowrate PIDs, while Figure A-8.6 shows the fuel flowrate average PID. 


Time Series Analysis : During the mainstage operation at 100% power level, abnormal 
behavior was detected approximately 16 seconds from the start for the MCC_PC and the 
PBP_DS_PR. Figure A-8.7 shows the ARMA error signal correlation function plot for the 

MCCPC. 


Cluster Algorithm: Input to the clustering algorithm consisted of the thirteen selected sensors 
(see Tkble 3.2a) minus those missing sensors listed in Thble A-l. The event threshold was 0.93 
and the fault detector was set for a five out of five event threshold. 


The performance of the algorithm on the 902-463 nominal data set is shown in Figure A-8.8. 
Principle decreases in the correlation coefficient, R, values occur at 65% RPL and during 
power transitions. No false alarms resulted from these deviations. 

The correlation coefficients, R, for 901-225 and the detection threshold are shown in Figure 
A-8.9. Throughout the test the R values remain above the threshold until engine shutdown. 
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A-7.8 THE CLUSTERING ALGORITHM RESULTS SHOW NO FALSE ALARMS 

FOR THE 902-463 NOMINAL DATA USING THE 902-249 SENSOR SUBSET. 



TIME (MCI) 

Flfl. A-7.9 THE CLUSTERING ALGORITHM RESULTS FOR TEST 902-249. THE 

CORRELATION COEFFICIENTS START BELOW THE THRESHOLD CAUSING 
A DETECTION TO OCCUR AT 5.2 SECONDS. 
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Fig. A-8.1 MCC PRESSURE (PID NO. 163) FOR TEST 901-225 



Fig. A-8.2 FUEL FLOW (PID NO. 250) FOR TEST 901-225 
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Fig. A-8.3 FUEL FLOW (PID NO. 251) FOR TEST 901-225 
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Fig. A-8.4 FUEL FLOW (PID NO. 252) FOR TEST 901-225 
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Fig. A-B.5 FUEL FLOW (PID NO. 253) FOR TEST 901-225 
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Fig. A-8.6 AVERAGE FUEL FLOW (PID NO. 131) FOR TEST 901-225 
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Fig. A-8.7 FAILURE DETECTION USING ARMA MODELS FOR PARAMETER 
MCC PRESSURE FOR TEST 901-225. 
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Fig. A-8.8 THE CLUSTERING ALGORITHM RESULTS SHOW NO FALSE ALARMS 

FOR THE 902-463 NOMINAL DATA USING THE 901-225 SENSOR SUBSET. 
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Fig. A-8.9 THE CLUSTERING ALGORITHM RESULTS FOR TEST 901-225. THE 
CORRELATION COEFFICIENTS REMAINED ABOVE THE DETECTION 
THRESHOLD UNTIL ENGINE SHUTDOWN. 
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9. Test 750-168: Valve Seal Failure 


According to Cikanek [ ], the failure was caused by anomalous augmented spark ignitor operation which 
caused cumulative damage to the OPOV downstream seal and resulted in high HPOT temperatures. (Cutoff 
time t = 300.2 seconds.) 

CADS Data : A plot of the MCC_PC, shown in Figure A-9.1, displays the engine power profile 
for the test. 

Regression Analysis : This failure occurred during the shutdown sequence. A model to predict 
the MCC_PC as a function of fuel and LOX flow rates is used to detect the failure as shown in 
Figure A-9.2. 

Cluster Algorithm : Input to the clustering algorithm consisted of the thirteen selected sensors 
(see Table 3.2a) minus those missing sensors listed in Thble A-l. The event threshold was 0.89 
and the fault detector was set for a five out of five event threshold. 

The performance of the algorithm for the nominal data set 902-463 is shown in Figure A-9.3. 

The principle deviations in the correlation coefficient, R, values occur at 65% RPL and during 
power transitions. No false alarms occurred on this data set. 

The correlation values for 750-168 and the detection threshold are shown in Figure A-9.4. The 
R values are above the detection threshold throughout the test. 

10. Test 901-284: Sensor Failure 

According to the Rocketdyne SAFD Phase II report, near the close of a nominal start, the following 
major events occurred: 

1. Channel B of the Controller cut itself off at t = 3.25 seconds (under launch conditions, this would 
have resulted in engine shutdown) due to a failure of electronic components in the facility power supply. 

2. At approximately t = 3.9 seconds, the Lee Jet orifice (used to purge the Channel A PC transducer 
passage) became dislodged and caused the PC transducer to sense the MCC coolant flow pressure instead of 
chamber pressure. This erroneous reading (3800 psi) caused the Controller to close the OPOV to reduce PC to 
the desired 3012 psi level. A few milliseconds later, the Controller calculated a mixture ratio of 9.0 and 
commanded the FPOV full open in an attempt to reduce the mixture ratio to 6.0. 

a. The immediate result of the Controller’s actions (based on an erroneous PC) was operation 
in an abnormal mode, characterized by high fuel flow and low turbine inlet temperatures of the 
oxidizer and fuel prebumer. In fact, the oxidizer prebumer turbine inlet temperature fell quickly to 

about 440 deg-R, which assured freezing of the water which makes up about 10% of the total flowrate 
of 40 lbs/sec. 

b. The ultimate result of the Controller’s actions was a fire in the HPOTP at about 9.7 seconds 
due to rubbing in the area of the LOX primary seal slinger. The rubbing was caused by a high axial 
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Fig. A-9.4 THE CLUSTERING ALGORITHM RESULTS FOR TEST 750-168. THE 
CORRELATION COEFFICIENTS REMAINED ABOVE THE DETECTION 
THRESHOLD UNTIL ENGINE SHUTDOWN. 
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load which displaced the rotor assembly toward the pump end of the HPOTP housing. This high axial 
load was caused by ice formation in the cavity between the housing and the second stage turbine wheel 
which resulted in reduction in the cavity pressure from about 2500 psi to near ambient. This reduced 
pressure on one side of the turbine wheel caused an estimated increase in rotor axial force of about 
31,000 lbs, which far exceeded the control capability of the balance pistons to control the position of 
the rotor. 

3. At 9.88 seconds, the test was terminated when the high pressure oxidizer prebumer pump radial 
accelerometer exceeded the lOg redline. (Test conducted on 30 July 1980, cutoff time: t = 9.88 seconds.) 

CADS Data : A plot of the MCC_PC, shown in Figure A-10.1, displays the engine power 
profile for the test. 

Regression Analysis : This failure occurred during the startup sequence. A model to predict 
the MCC_PC as a function of fuel and LOX flow rates is used to detect the failure as shown in 
Figure A- 10.2. 

Cluster Algorithm : Input to the clustering algorithm consisted of the thirteen selected sensors 
(see Table 3.2a) minus those missing sensors listed in Thble A-l. The event threshold was 0.744 
and the fault detector was set for a five out of five event threshold. 

The performance of the algorithm on the nominal data 902-463 is shown in Figure A- 10.3. As 
shown in the plot, the correlation coefficient, R, values which drop below the threshold occur 
during three of the power transitions. Of these events, the crossings at 30 and 41 seconds led to 
false alarms. A preliminary analysis has identified the likely cause of the false alarms to be the 
absence of PID 59, the Prebumer Pump Discharge Pressure. As discussed in Section 3.1.3.3.2, 

PID 59 is an example of a parameter whose magnitude was poorly modelled by the PBM 
estimator and thus given a larger weighting in the clustering algorithm. The loss of a highly 
weighted parameter decreases the algorithm stability during power level changes, and 
therefore, produces false alarms during those transitions. 

The correlation values for 901-284 and the detection threshold are shown in Figure A-10.4. 

The R values are below the threshold throughout the test. This concurs with the fact that the 
failure occurred during startup and forced the engine into an abnormal operational state prior 
to entering mainstage. 


11. Test 750-259: MCC Duct Fracture 

According to the Rocketdyne SAFD Phase II report, during stable operation at 109% of rated power 
level, a small fuel leak developed in the MCC outlet neck (determined by film review). The leak caused less 
than .25% change in nominal values for the LPFP speed, discharge pressure and OPOV position. The fuel leak 
remained essentially constant until approximately 200 milliseconds prior to cutoff, at which time a major fuel 
leak occurred at apparently the same location based on both data and film review. In response to the rupture, 
the LPFP rapidly decayed in speed. This speed drop reduced the pump’s discharge pressure and the high 
pressure fuel pump (HPFTP) went into deep cavitation. As a consequence, the HP FTP speed (PID-261) 
exceeded its nominal speed by approximately 10,000 rpm. The off-nominal condition led the pump to exceed 
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A-10.3 THE CLUSTERING ALGORITHM RESULTS SHOW FALSE ALARMS OCCURING 
AT 30 AND 41 SECONDS FOR THE 902-463 NOMINAL DATA USING THE 
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Fig. A-10.4 THE CLUSTERING ALGORITHM RESULTS FOR TEST 901-284. THE 
CORRELATION COEFFICIENTS ARE BELOW THE DETECTION 
THRESHOLD THROUGHOUT THE TEST. 


156 




its vibration redline and led to a cutoff command. Following cutoff, the fuel caviation condition resulted in: 
reduced engine fuel flow, a severe oxygen-rich condition, burnout of the turbines, bum-through of the hotgas 
manifold, severe erosion of the gimbal bearing, and eventual separation of the engine below the low pressure 
pumps. (Test conducted on 27 March 1985, cutoff time: t = 101.5 seconds.) 

CADS Data : A plot of the MCC_PC is shown in Figure A-11.1. 

Time Series Analysis : For this test, the nominal ARMA model for a parameter such as 
LPFT_DS_PR, over a 4 second window ( 100 data points) did not indicate any failures until the 
redline cutoff as shown in Figure A-11.2. However, nominal models over a longer time of 40 
seconds (1000 data points) were effective in detecting deviations from the nominal beginning 
around 12 seconds from the start for LPFT DS P, and 27 seconds from the start for the 
HPFP_DS_PR in Figure A-11.3. Because of the 1000 point window, the earliest possible 
failure indication occurred at 52 seconds from the start. Figures A-11.4 through A-11.6 show 
the ARMA error signal correlation function plots for LPFT_DS_PR and HPFP DS PR. 

Cluster Algorithm : Input to the clustering algorithm consisted of the thirteen selected sensors 
(see Tkble 3.2a) minus those missing sensors listed in Tkble A-l. The event threshold was 0.89 
and the fault detector was set for a five out of five event threshold. 

The performance of the algorithm on the nominal data set 902-463 is shown in Figure A-l 1.7. 

The decreases in the correlation coefficient, R, values are associated with the engine operating 
at 65% RPL and the transitions in engine power. No false alarms occurred for the given 
threshold. 

The correlation values for 750-259 and the detection threshold are shown in Figure A-11.8. 
Throughout the test, the R values remained well above the detection threshold until engine 
shutdown. This is expected, since the major sensor changes did not occur until 200 
milliseconds (5 data samples) prior to normal engine shutdown. 

12. Test 901 -1 73: Main Injector Fracture 

According to the Rocketdyne SAFD Phase II report, during stable operation at 92% of rated power level, 
LOX post 10, row 13 cracked through at the tip radius between the primary and secondary faceplates. Hot gas 
flow into the LOX post ignited and burned out the post. LOX pouring into the face coolant manifold caused 
bum through of the primary and secondary faceplates, dumping face coolant into the hot gas manifold. 
Ejection of burner debris caused severe nozzle tube rupture (46 tubes). Fuel loss to the prebumers coupled 
with engine control reactions to maintain MCC_PC caused the HPFT discharge temperature to exceed its 
redline, producing a premature cutoff at t = 201.17 seconds. (Test conducted on 4 April 1978, cutoff time: t = 
201.17 seconds.) 

CADS Data : A plot of the MCC_PC is shown in Figure A- 12.1. A significant number of PIDS 
such as the HPFTDST, HPOTDSTB, FPB PC, and OPB PC were missing from CADS 
data. 


Time Series Analysis : For the first 100 seconds of operation at 70% RPL, the ARMA models 
indicate nominal operation. After the power level change to 92% RPL, a number of 
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Fla. A-11.5 FAILURE DETECTION USING ARMA MODELS FOR PARAMETER 

LPFT DISCHARGE PRESSURE FOR TEST 750-259 (40 sec window). 
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I. A-11.7 THE CLUSTERING ALGORITHM RESULTS SHOW NO FALSE ALARMS 

FOR THE 902-463 NOMINAL DATA USING THE 750-259 SENSOR SUBSET. 
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Fig. A-11.8 THE CLUSTERING ALGORITHM RESULTS FOR TEST 750-259. THE 
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THRESHOLD UNTIL ENGINE SHUTDOWN. 
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Fig. A-12.1 MCC PRESSURE (PID NO. 130) FOR TEST 901-173 
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parameters including OPOV_ACT_POS, LPFP_SPD, HPFP_SPD, and HPOP_SPD start to 
show deviations from the nominal. OPOV_ACT_POS shows deviations from as early as 160 
seconds from the start. Figures A- 12.2 through A-12.9 show the plots of data and the 
correlation functions of the residuals. 

Cluster Algorithm : Input to the clustering algorithm consisted of the thirteen selected sensors 
(see Tkble 3.2a) minus those missing sensors listed in Tkble A-l. The event threshold was 0.94 
and the fault detector was set for a five out of five event threshold. 

The performance of the algorithm for nominal data is shown in Figure A- 12. 10. No false 
alarms occurred for the given threshold. 

The correlation values, R, for 901-173 and the detection threshold are shown in Figure 
A-12.11. The R values remain above the detection threshold when the engine is operating at 
69% RPL. While the engine transitions to 90% RPL ( at 100 seconds), the R values drop below 
the threshold at 102.1 seconds, and continue decreasing even though the engine maintains a 
constant power. 


13. Test 901-331: Main Injector Fracture 

According to the Rocketdyne SAFD Phase II report, during stable operation at 100% of rated power 
level, LOX post 79, row 13 failed in the 316L material at the inertial weld (which joins a 316L post to an 
INC0718 interpropellant plate stub). Test data analysis reveals that the LOX post failure occurred first, and 
subsequently, did major damage to the injector. Once the injector was damaged, a loss in C-star efficiency 
resulted and caused a reduction in MCC_PC. The engine control system responded by increasing the OPOV 
(Oxidizer Prebumer Oxidizer Valve) open position. The increased LOX flowrate necessary to maintain the 
100% rated power level caused the HPOT discharge temperature to exceed its redline (1760 deg-R). The test 
was thus cutoff prematurely at t = 233.14 seconds. (Test conducted on 15 July 1981, cutoff time: t = 233.14 
seconds.) 

CADS Data : A plot of the MCC_PC, shown in Figure A-13.1, displays the engine power 
profile. A plot of the LPOP discharge pressure, used for the ARMA fault detection is shown in 
figure A- 13.2. 

Time Series Analysis : During mainstage operation at 100% RPL, failure detection by the 
ARMA models occurred at the time of the redline cutoff, as shown in Figure A-13.3. 

Ouster Algorithm : Input to the clustering algorithm consisted of the thirteen selected sensors 
(see Tkble 3.2a) minus those missing sensors listed in Tkble A-l. The event threshold was 0.669 
and the fault detector was set for a five out of five event threshold. 

The performance of the algorithm for nominal data is shown in Figure A- 13.4. As shown in the 
plot, the threshold is exceeded during power transitions at 28 seconds, 41 seconds, 52 seconds, 
and 517 seconds. Each detection resulted in a false alarm. Similar to test 901-284 described in 
Section A- 11, these false alarms are attributed to the absence of PID 59, the Prebumer Pump 
Discharge Pressure. 
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Fig. A-12.4 HPFP SPEED (PID HO. 261) FOR TEST 901-173 
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Fig. A-12.5 HPOT SPEED (PID NO. 128) FOR TEST 901-173 
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Fig. A-12.10 THE CLUSTERING ALGORITHM RESULTS SHOW NO FALSE ALARMS 

FOR THE 902-463 NOMINAL DATA USING THE 901-173 SENSOR SUBSET. 



Fig. A-12.11 THE CLUSTERING ALGORITHM RESULTS FOR TEST 901-173. 
FAULT DETECTION OCCURED AT 102.1 SECONDS. 
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The correlation coefficients, R, for 901-331 and the detection threshold are shown in Figure 
A- 13.5. The R values are above the detection threshold at the during most of the test. At 50 
seconds from start, when the engine transitions from 100% RPL to 90% RPL, the R values 
drop below the threshold and cause a fault detection at 50.2 seconds. Since the engine was 
maintaining a steady power of 90% RPL, the detection at 50.2 seconds is considered valid and 
not a false alarm due to a transition. 

14. Test 901-222: Heat Exchanger Tube Leak 

According to the Rocketdyne SAFD Phase II report, at the close of engine start, the test was terminated 
by the heat exchanger outlet pressure minimum redline. It was concluded from the test data that the incident 
was caused by a leak in the heat exchanger coil. The leak occurred prior to, or during the early part of the start, 
as evidenced by the excessive coil pressure drop. The high pressure drop indicates increased mass flow. The 
coil failure was located near the heat exchanger inlet and discharge area, as shown by the hardware damage. 
Oxygen from the leak became entrained in the fuel-rich prebumer combustion gas. The mixed gases were 
ignited when the turbine discharge gas reached a high enough temperature during the thrust build-up ramp. 
The radial accelerometer spike at 3.4 seconds indicates that ignition occurred as a detonation, and was near 
the heat exchanger inlet/outlet area. The continued combustion of the hydrogen-rich prebumer combustion 
products and leaking oxygen caused burning of the coil; the change in nozzle flame pattern at 3.58 seconds 
shows evidence of metal burning. The heat exchanger coil pressure decayed to below the hot gas manifold 
pressure at 3.71 seconds, indicating that the heat exchanger coils were completely severed, with extensive 
communication occurring between the coil and hot gas. Hot gas flowing into the discharge end of the severed 
coil combusted in the discharge line, with oxygen from the bypass system. The discharge line burned through 
(4.185 seconds in the motion pictures) causing a rapid decay in discharge pressure at 4.212 seconds. (Test 
conducted on 6 December 1978, cutoff time: t = 4.33 seconds.) 

CADS Data : A plot of the MCC_PC, shown in Figure A- 14.1, depicts the engine power profile. 

Regression Analysis : This failure occurred during the startup sequence. A model to predict 
the MCC_PC as a function of fuel and LOX flow rates is used to detect the failure, as shown in 
Figure A- 14.2. 

Cluster Analysis : The engine did not achieve mainstage, therefore the cluster algorithm was 
not applicable. 


15. Test 901-340: T/A Duct Rupture 

According to the Rocketdyne SAFD Phase II report, during stable operation at 109% of rated power 
level, the following series of events occurred within the HPFTP : (1) the 2nd rotor platform seal and the T/A 
(Him Around) duct inner wall fractures at t = 20.6 seconds from start; (2) the nut erodes, the 2nd rotor exit 
straightening vane breaks out, and the T/A duct inner wall fractures propagate at t = 277 seconds; (3) the 
washer lodges on the nozzle vane, and T/A duct sheet metal deflects at t = + 280 seconds; (4) major ruptures 
occur in the T/A duct at t = 290 seconds; (5) the T/A duct sheet metal flap breaks loose at t = 357 seconds. At t 
= 405.5 seconds the test was shutdown due to a High Pressure Fuel Tbrbine (HPFT) discharge temperature 
redline. (Test conducted on 15 October 1981, cutoff time: t = 405.50 seconds.) 
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CADS Data: A plot of the MCC_PC, Figure A-15.1, depicts the engine power profile. 


Time Series Analysis : During the mainstage operation at 109% RPL, abnormal behavior was 
detected approximately 16 seconds from the start for a number of parameters. Figures A- 15.2 
through A-15.6 show the data and corresponding ARMA error signal correlation function 
plots. 

Cluster Algorithm : Input to the clustering algorithm consisted of the thirteen selected sensors 
(see Table 3.2a) minus those missing sensors listed in lable A-l. The event threshold was 0.94 
and the fault detector was set for a five out of five event threshold. 

The performance of the algorithm on the nominal data set 902-463 is shown in Figure A- 15. 7. 

There are no significant deviations in the plot and the correlation coefficient, R, values are all 
above the .936 threshold. No false alarms occurred on this data set. 

The correlation values for 901-340 and the detection threshold are shown in Figure A-15.8. 

The R values are above the detection threshold throughout the test until shutdown. 

16. Test SF6-01 : MFV Crack 

According to the Rocketdyne SAFD Phase II report, during stable operation at 100% of rated power 
level, the Main Fuel Valve (MFV) on Main Engine-1 (ME-1), engine 2002, developed a cracked housing 
allowing hydrogen to leak into the boattail area. The loss of hydrogen caused the high pressure fuel turbine 
discharge temperature to rise above its redline and a shutdown was initiated. The failure occurred due to 
fatigue, initiated at small surface defects caused by either salt stress corrosion, surface oxidation, or hydrogen 
embrittlement. (Test conducted on 2 July 1979, cutoff time: t = 18.58 seconds.) 

CADS Data: No analysis could be performed because of corrupted data. 


174 




175 



Correlation Function 



176 



Correlation Function Correlation Function 



Fig. A-15.5 FAILURE DETECTION USING ARMA MODELS FOR PARAMETER 
PBP DISCHARGE PRESSURE FOR TEST 901-340. 



Fig. A-15.6 FAILURE DETECTION USING ARMA MODELS FOR PARAMETER 
LPOP DISCHARGE PRESSURE FOR TEST 901-340. 
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Fig. A-15.7 THE CLUSTERING ALGORITHM RESULTS SHOW NO FALSE ALARMS 

FOR THE 902-463 NOMINAL DATA USING THE 901-340 SENSOR SUBSET. 
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Fig. A-15.8 THE CLUSTERING ALGORITHM RESULTS FOR TEST 901-340. THE 
CORRELATION COEFFICIENTS REMAINED ABOVE THE DETECTION 
THRESHOLD UNTIL ENGINE SHUTDOWN. 
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